Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

34
DNA RESEARCH 7, 31-63 (2000) Short Communication Structural Analysis of Arabidopsis thaliana Chromosome 5. X. Sequence Features of the Regions of 3,076,755 bp Covered by Sixty PI and TAC Clones Shusei SATO, Yasukazu NAKAMURA, Takakazu KANEKO, Tomohiko KATOH, Erika ASAMIZU, Hirokazu KOTANI, and Satoshi TABATA* Kazusa DNA Research Institute, 1532-3 Yana, Kisarazu, Chiba 292-0812, Japan (Received 24 January 2000) Abstract In our ongoing project to deduce the nucleotide sequence of Arabidopsis thaliana chromosome 5, non- redundant PI and TAC clones have been sequenced on the basis of the fine physical map, and as of January, 2000, the sequences of 16.6 Mb representing approximately 60% of chromosome 5 have been accumulated and released at our web site. Along with the sequence determination, structural features of the sequenced regions have been analyzed by applying a variety of computer programs, and we already predicted a total of 2697 potential protein coding genes in the 11,166,130 bp regions, which are covered by 159 PI and TAC clones. In this paper, we describe the structural features of the 3,076,755 bp regions covered by newly analyzed 60 PI and TAC clones. A total of 715 potential protein coding genes were identified, giving an average density of the genes identified of 1 gene per 4001 bp. Introns were observed in 80% of the genes, and the average number per gene and the average length of the introns were 4.5 and 147 bp, respectively. These sequence features are nearly identical to those in our latest report in which the data were compiled based on a new standard of gene assignment including the computer-predicted hypothetical genes. The regions also contained 12 tRNA genes when searched by similarity to reported tRNA genes and the tRNA scan-SE program. The sequence data and information on the potential genes are available through the World Wide Web database KAOS (Kazusa Arabidopsis data Opening Site) at http://www.kazusa.or.jp/kaos/. Key words: Arabidopsis thaliana chromosome 5; genomic sequence; PI genomic library; TAC genomic library; gene prediction In order to investigate the whole genetic system in genes in the sequenced regions have been analyzed us- higher plants, we have been operating a sequencing ing a variety of computer programs for similarity search project of the genome of a dicot model plant Arabidop- and gene modeling, and we so far predicted the potential sis thaliana. Of five chromosomes that constitute the genes in a total of 11,166,130 bp which are represented A. thaliana genome of approximately 120 Mb, we fo- by 159 PI and TAC clones. 3 ~ n In this paper, we newly cused our efforts on chromosomes 5 and 3. For pre- investigated the structural features of the 3,076,755 bp cise localization of the clones for DNA sequencing, we regions covered by an additional 60 PI and TAC clones, constructed the fine physical maps of both chromosomes with clones from YAC, PI, TAC, and BAC libraries. 12 lm isolation and Sequencing of PI and TAC On the basis of the fine physical map information, PI Clones and TAC clones were selected and assigned on the map by polymerase chain reaction (PCR), and then subjected DNA sources and the method of clone isolation were to sequence analysis. As of January 2000, the regions of essentially the same as described in the previous paper. 3 16.6 Mb representing approximately 60% of chromosome The PI and TAC clones containing the DNA regions 5 have been sequenced and the data are available at our which cover a total of 60 DNA markers on chromosome web site KAOS (Kazusa Arabidopsis data Opening Site, 5 were isolated by screening the Mitsui PI 12 and TAC 13 http://www.kazusa.or.jp/kaos/). In parallel, potential libraries by means of PCR using primers designed from ~ : —— ~~ : the sequence information of DNA markers. The DNA Communicated by Mituru lakanami * To whom correspondence should be addressed. Tel. +81-438- markers and selected clones are listed in Table 1. Rel- 52-3933, Fax. +81-438-52-3934, E-mail: [email protected] ative positions of the markers and the sequenced clones Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236 by guest on 26 March 2018

Transcript of Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

Page 1: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

DNA RESEARCH 7, 31-63 (2000) Short Communication

Structural Analysis of Arabidopsis thaliana Chromosome 5.X. Sequence Features of the Regions of 3,076,755 bp Covered bySixty PI and TAC Clones

Shusei SATO, Yasukazu NAKAMURA, Takakazu KANEKO, Tomohiko KATOH, Erika ASAMIZU,

Hirokazu KOTANI, and Satoshi TABATA*

Kazusa DNA Research Institute, 1532-3 Yana, Kisarazu, Chiba 292-0812, Japan

(Received 24 January 2000)

Abstract

In our ongoing project to deduce the nucleotide sequence of Arabidopsis thaliana chromosome 5, non-redundant PI and TAC clones have been sequenced on the basis of the fine physical map, and as of January,2000, the sequences of 16.6 Mb representing approximately 60% of chromosome 5 have been accumulatedand released at our web site. Along with the sequence determination, structural features of the sequencedregions have been analyzed by applying a variety of computer programs, and we already predicted a totalof 2697 potential protein coding genes in the 11,166,130 bp regions, which are covered by 159 PI and TACclones. In this paper, we describe the structural features of the 3,076,755 bp regions covered by newlyanalyzed 60 PI and TAC clones. A total of 715 potential protein coding genes were identified, giving anaverage density of the genes identified of 1 gene per 4001 bp. Introns were observed in 80% of the genes, andthe average number per gene and the average length of the introns were 4.5 and 147 bp, respectively. Thesesequence features are nearly identical to those in our latest report in which the data were compiled basedon a new standard of gene assignment including the computer-predicted hypothetical genes. The regionsalso contained 12 tRNA genes when searched by similarity to reported tRNA genes and the tRNA scan-SEprogram. The sequence data and information on the potential genes are available through the World WideWeb database KAOS (Kazusa Arabidopsis data Opening Site) at http://www.kazusa.or.jp/kaos/.Key words: Arabidopsis thaliana chromosome 5; genomic sequence; PI genomic library; TAC genomiclibrary; gene prediction

In order to investigate the whole genetic system in genes in the sequenced regions have been analyzed us-higher plants, we have been operating a sequencing ing a variety of computer programs for similarity searchproject of the genome of a dicot model plant Arabidop- and gene modeling, and we so far predicted the potentialsis thaliana. Of five chromosomes that constitute the genes in a total of 11,166,130 bp which are representedA. thaliana genome of approximately 120 Mb, we fo- by 159 PI and TAC clones.3~n In this paper, we newlycused our efforts on chromosomes 5 and 3. For pre- investigated the structural features of the 3,076,755 bpcise localization of the clones for DNA sequencing, we regions covered by an additional 60 PI and TAC clones,constructed the fine physical maps of both chromosomeswith clones from YAC, PI , TAC, and BAC libraries.12

lm isolation and Sequencing of PI and TACOn the basis of the fine physical map information, PI Clonesand TAC clones were selected and assigned on the mapby polymerase chain reaction (PCR), and then subjected DNA sources and the method of clone isolation wereto sequence analysis. As of January 2000, the regions of essentially the same as described in the previous paper.3

16.6 Mb representing approximately 60% of chromosome The PI and TAC clones containing the DNA regions5 have been sequenced and the data are available at our which cover a total of 60 DNA markers on chromosomeweb site KAOS (Kazusa Arabidopsis data Opening Site, 5 were isolated by screening the Mitsui PI 1 2 and TAC13

http://www.kazusa.or.jp/kaos/). In parallel, potential libraries by means of PCR using primers designed from~ : ——— ~~ : the sequence information of DNA markers. The DNACommunicated by Mituru lakanami

* To whom correspondence should be addressed. Tel. +81-438- markers and selected clones are listed in Table 1. Rel-52-3933, Fax. +81-438-52-3934, E-mail: [email protected] ative positions of the markers and the sequenced clones

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 2: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

32 Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 7,

on chromosome 5 are shown in Fig. 1. The relative ori-entation of each clone and contig on the chromosome hasbeen confirmed by anchoring both ends of the clone tothose at the corresponding positions of the contig map.

The nucleotide sequence of each PI or TAC insert wasdetermined according to the bridging shotgun methoddescribed previously.3 The length of the nucleotide se-quence of each PI or TAC insert finally confirmed is listedtogether with the accession numbers in Table 1.

2. Assignment of Potential Coding Regions

For assignment of the protein coding regions and genemodeling, similarity search and computer prediction wereperformed as described in the previous paper.3 Briefly,similarity search against the non-redundant protein se-quence database nr (compiled by NCBI) was carried outusing the BLASTX14 program. In parallel, the posi-tions of potential protein coding regions were predictedwith the Grail,15 GENSCAN16 and NetGene218 com-puter programs. The transcribed regions were assignedby comparison of the nucleotide sequences with Arabidop-sis ESTs18-19 in the public databases using the BLASTNprogram.14 All the results obtained were compiled withthe aid of our new web-based tool, named ArabidopsisGenome Displayer (manuscript in preparation), then as-signment of the potential protein coding genes was car-ried out by taking both similarity to known genes andcomputer prediction into consideration. Therefore, theregions predicted only by the computer programs withno apparent similarity to known genes were also assignedas genes. This standard of gene assignment has beenadopted since the analysis in our last report,11 whilesuch computer-predicted hypothetical genes were not in-cluded in the earlier analyses.3"10 To sum up, 715 poten-tial protein-coding genes as well as 54 partial genes lo-cated at the terminal regions of the clones and 43 pseudogenes were assigned in the 3,076,755 bp regions, givingan average gene density of 1 gene per 4001 bp. Thisvalue is lower than that in our latest report11 in whichthe data were compiled based on a new standard of geneassignment described above, and is higher than that ob-served in regions of chromosomes 220 and 4.21 The reasonfor this inconsistency is thought to be the difference inthe ratio of heterochromatic regions within the analyzedsequences.

In addition to the protein-coding regions, the RNAcoding regions were assigned on the basis of sequence sim-ilarity to the reported structural RNAs. For tRNA genes,the prediction by the tRNAscan-SE program22 was alsotaken into account. As a result, 12 tRNA genes corre-sponding to 12 amino acid species and genes for Ul, U3and U4 snRNAs were identified in the 3,076,755 bp re-gions. Both potential protein and RNA coding genes aredenoted by numbers with the clone names followed bysequential numbers from one end to another of the in-

length (Mbp)

0 —,

10

20 —

mi121

mi97

mil 74-

mi322 .mi438 -

mi138 .mi433 -mi90 '

mi219-

mi125-

mi137-PHYC

mi323 "

mi194-mi83 •

mi61

nga129~

g4130 -

mi69 -mi70 -mi184"

g2455

mi335

,K5A21,MQM1MSK10K2K18MWP19MAB16K15O15MUD 12K1O13MUF8K5J14K16E1MWF20K17022K9E15K2N11K11I1MSD23MQL5K15F13MIF21MJE7K20J1K7J8K6M13/MNI5K9P8K6A12MFB16MWD22K17N15MJM18K24M7MNB8MYN8MJP23K5F14MWJ3K24C1MKN22MPI10MHM17MUL3/MJB24K21L19MCK7MZN1K18B18MGO3K22G18MMI9MRG21K9H21MBM17MXK3K22J17/K14B20K1L20MSN2K21H1

30 —J

Figure 1. Relative locations of the sequenced PI and TAC clonesand the associated markers on the physical map of chromosome5. The positions of DNA markers used for PI and TAC isolationand of other major DNA markers were localized on the mapon the basis of the YAC tiling path and map information inre/. 1. The vertical open bar represents the entire length ofchromosome 5. The names of PI and TAC clones are givenat the right side, and those of markers at the left side. Thedistance (Mbp) from the telomeric site of the top arm is givenin the vertical scale.

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 3: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

No. 1] S. Sato et al. 33

Table 1. Information of the sequenced PI and TAC clones.

Clone

name

(~~KIL20K 1 0 1 3K2K18K2NI1K5A21K5F14K5J14K6AI2K6M13

K7J8K9E15K9H2IK9P8Ki l l ]

K14B20K15OI5K16EI

K16F13K17NI5K17022K18B18K20JIK21H1K21L19K22G18K22J17K24C1K24M7MAB16MBM17MCK7MFB16MG03

MHM17MIF21MJB24MJE7

MJM18MJP23

MKN22MMI9MNB8MNI5MPI10MQL5MQM1MRG21MSD23MSK10MSN2

MUD 12MUF8MUL3

MWD22MWF20MWJ3

MWP19MXK3MYN8MZN1

DNA markers

endsof K2A18&K1F13ends of MEE6&MYC6

CICIIFILendsof MFC19&MRA 19

MDJ22_right endCIC5H10R

ends of MJC20&MDH9MXI22 left end

nga!29K21P3_rightend

endsof K18C1 &MFC 19endsof MDCI2&MLE2

MPF2l_rightendMPLI2_rightendK2A18_leflend

CIC10E5Rends of MDH9&MFO20

endsof MCA23&MDN 11CIC11FI0

endsof K21C13&K18C1ends of K19M22&MNC 17endsofK19E20&K21P3

CIC3B1MCK7 left end

endsof MAC9&MTG 10endsof MPA24&K 14B20

MDA7_right endCIC11F10CtC10E5L

endsof MGI19&MHJ24mil84

CIC11F10MMN10_nghtend

CIC10B4LMDNll_rightendMSF19 left endKI5N18 left end

endsof MIO24&MSG 15endsof K19P17&K18G13

endsof MCD7&MIK 19endsofMTG10&K19Bl

MXC20 right endK6MI3_rightend

mi69CICI1B8L

endsof K19M13&MR0 IIendsof K19B1&MQB2

ends of MZA15&MQD22C1CI1F1

endsof K1F13&MUD21endsof MSN9&MYH 19

endsofMBK23&K16L22MJB24 left endK3K7 right end

CIC5F12LMDF20_right end

endsof MXH1&M1K22C1C4B2L

endsof K19E1&MNC6K19M22_leftend

Confirmed

length (bp)

476652527541465303401388031178597626413677129569636205215319706705152940251230263396319742812936772035896362437434241087454531121129498739997047552717870906608743570784235937258589742981620331827272298173646872210112960588398813655515133479814146292722601137768201087180911934235611026814945452881672

Accssion

number

AB0222I1ABO 19225AB02303IAB0222I3AB024030AB0222I4AB023032AB024031ABO23O33AB023034AB020744ABO23O35AB024032ABO 19223ABO181O8AB024026AB022210AB024025AB018109ABO 19224AB024027AB023028AB020742AB024029AB022212AB020743AB023029ABO 19226AB0181I2ABO 19227ABO 19228AB023037ABO 19231AB024035AB023039AB019233AB020745AB025623AB018115AB019234AB019235AB018116AB025627AB020747ABO18117AB025633AB020751AB022221AB024037AB018119AB022222AB025635AB023042AB023044AB025638AB018120AB020753ABO 19236AB020754AB020755

sert, which are listed in the table below the figure, andare also schematically represented in Fig. 2.

3. Structural Features of Potential ProteinGenes

In this paper, the complete structures of 715 potentialprotein coding genes were predicted. Structural featuresof these genes as well as those of 2619 genes includingthose previously identified are listed in Table 1. Theyamount for approximately 13.1% of the total gene con-stituents (2xlO4 genes) assumed for A. thaliana. Ap-proximately 77% of the protein-coding genes containedintrons, and the average number per gene and their av-erage length were 4.0 and 167 bp, respectively.

4. Expression Level of Potential Protein Genesand Gene Segments

The nucleotide sequence of each of the potential pro-tein coding genes was compared with those in the Ara-bidopsis EST database, and the number of matched Ara-bidopsis ESTs was counted to monitor the transcrip-tional level of each gene. Of 715 complete and 54 par-tial genes that we have identified in chromosome 5 inthis study, 290 carried matched ESTs. The putativeproducts of the genes hit by 10 or more EST files,suggesting to be a class of highly expressed genes, in-clude those showing sequence similarity to multicat-alytic endopeptidase complex, proteasome component,alpha subunit in A. thaliana (K2K18.4), xylosidase inAspergillus niger (K7J8.3), hypothetical protein in A.thaliana (K18B18.8), subtilisin-like proteinase homologin A. thaliana (K18B18.9), outer membrane lipopro-tein Blc precursor in Citrobacter freundii (K21L19.6),26S protease regulatory subunit 6B homolog in Solariumtuberosum (MCK7.16), unknown protein in A. thaliana(MIF21.5), RNA helicase in A. thaliana (MMI9.2),40S ribosomal protein S20 in A. thaliana (MMI9.13),tubulin beta-2/beta-3 chain in A. thaliana (MRG21.11and MRG21.12), cytoplasmic malate dehydrogenase inA. thaliana (MWF20.2), NOI protein in A. thaliana(MWJ3.3), and glutamate synthase precursor in Med-icago sativa (MYN8.7).

The sequence data as well as the gene informationshown in this paper are available through the World WideWeb at http://www.kazusa.or.jp/kaos/.

Acknowledgements: We thank S. Sasamoto for ex-cellent technical assistance. Thanks are also due toT. Kimura, T. Hosouchi, K. Idesawa, K. Kawashima,M. Matsumoto, A. Matsuno, A. Muraki, N. Nakazaki,5. Shinpo, C. Takeuchi, T. Wada, A. Watanabe, M.Yamada, and M. Yasuda for their excellent teamwork insequence analysis. We are grateful to A. Tanaka for tech-nical advice, and Mitsui Plant Biotechnology ResearchInstitute and Arabidopsis Biological Resource Center atthe Ohio State University for providing the DNA mark-ers and the DNA libraries. This work was supported bythe Kazusa DNA Research Institute Foundation.

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 4: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

34 Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 7,

Table 2. Structural features of potential protein coding genes in A. thahana chromosome 5.

Features 715 genesa 2619 genes'1'Gene length (bp) including intronsProduct length (amino acids)Genes with intronsNumber of introns/geneExon length (bp)Intron length (bp)GC content of exonsGC content of introns

74-14479 (1993)25-2216 (445)5750-42 (4.5)3-4473 (245)26-1450 (147)43%32%

62-14479 (1965)19-2756 (433)20120-42 (4.0)2-4473 (260)8-5405 (167)43%32%

Structural features of the potential protein-coding genes assigned so far are listed. The715 genes are assigned based on the new standard in this studya' and the 2619 genes1''include previously assigned 1901 potential protein genes. Average values are shown inparentheses.

References

1. Kotani, H., Sato, S., Liu, Y-G. et al. 1997, A fine physicalmap of Arabidopsis thaliana chromosome 5: Constructionof a sequence-ready contig map, DNA Res., 4, 371-378.

2. Sato, S., Kotani, H., Hayashi, R. et al. 1998, A physicalmap of Arabidopsis thaliana chromosome 3 representedby two contigs of CIC YAC, PI, TAC and BAC clones,DNA Res., 5, 163-168.

3. Sato, S., Kotani, H., Nakamura, Y. et al. 1997, Structuralanalysis of Arabidopsis thaliana chromosome 5. I. Se-quence features of the 1.6 Mb regions covered by twentyphysically assigned PI clones, DNA Res., 4, 215-230.

4. Kotani, H., Nakamura, Y., Sato, S. et al. 1997, Structuralanalysis of Arabidopsis thaliana chromosome 5. II. Se-quence features of the regions of 1,044,062 bp coveredby thirteen physically assigned PI clones, DNA Res., 4,291-300.

5. Nakamura, Y., Sato, S., Kaneko, T. et al. 1997,Structural analysis of Arabidopsis thaliana chromo-some 5. III. Sequence features of the regions of1,191,918 bp covered by seventeen physically assigned PIclones, DNA Res., 4, 401-414.

6. Sato, S., Kaneko, T., Kotani, H. et al. 1998, Structuralanalysis of Arabidopsis thaliana chromosome 5. IV. Se-quence features of the regions of 1,456,315 bp covered bynineteen physically assigned PI and TAC clones, DNARes., 5, 41-54.

7. Kaneko, T., Kotani, H., Nakamura, Y. et al. 1998,Structural analysis of Arabidopsis thaliana chromosome5. V. Sequence features of the regions of 1,381,565 bpcovered by twenty one physically assigned PI and TACclones, DNA Res., 5, 131-145.

8. Kotani, H., Nakamura, Y., Sato, S. et al. 1998, Structuralanalysis of Arabidopsis thaliana chromosome 5. VI. Se-quence features of the regions of 1,367,185 bp covered by19 physically assigned PI and TAC clones, DNA Res., 5,203-216.

9. Nakamura, Y., Sato, S., Asamizu, E. et al. 1998,Structural analysis of Arabidopsis thaliana chromosome5. VII. Sequence features of the regions of 1,013,767 bpcovered by sixteen physically assigned PI and TAC

clones, DNA Res., 5, 297-308.10. Asamizu, E., Sato, S., Kaneko, T. et al. 1998, Structural

analysis of Arabidopsis thaliana chromosome 5. VIII.Sequence features of the regions of 1,081,958 bp cov-ered by seventeen physically assigned PI and TAC clones,DNA Res., 5, 379-391.

11. Kaneko, T., Kato, T., Sato, S. et al. 1999, Structuralanalysis of Arabidopsis thaliana chromosome 5. IX. Se-quence features of the regions of 1,011,550 bp covered byseventeen PI and TAC clones, DNA Res., 6, 183-195.

12. Liu, Y.-G., Mitsukawa, N., Vazquez-Tello, A., andWhittier, R. F. 1995, Generation of a high-quality PIlibrary of Arabidopsis suitable for chromosome walking,Plant J., 7, 351-358.

13. Liu, Y.-G., Shirano, Y., Fukaki, H. et al. 1999, Com-plementation of plant mutants with large genomic DNAfragments by a transformation-competent artificial chro-mosome vector accelerates positional cloning, Proc. Natl.Acad. Sci. USA, 96, 6535-6540.

14. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., andLipman, D. J. 1990, Basic local alignment search tool, J.Mol. Bwl., 215, 403-410.

15. Uberbacher, E. C. and Mural, R. J. 1991, Locatingprotein-coding regions in human DNA sequences by amultiple sensor-neural network approach, Proc. Natl.Acad. Sci. USA, 88, 11261-11265.

16. Burge, C. and Karlin, S. 1997, Prediction of completegene structures in human genomic DNA, J. Mol. Biol.,268, 78-94.

17. Hebsgaard, S. M., Korning, P. G., Tolstrup, N. et al.1996, Splice site prediction in Arabidopsis thaliana DNAby combining local and global sequence information,Nucl. Acids Res., 24, 3439-3452.

18. Newman, T., Bruijn, F. J., and Green, P. 1994, Genesgalore: A summary of methods for accessing results fromlarge-scale partial sequencing of anonymous ArabidopsiscDNA clones, Plant Physioi, 106, 1241-1255.

19. Cooke, R., Raynal, M., Laudie, M. et al. 1996, Furtherprogress towards a catalogue of all Arabidopsis genes:analysis of a set of 5000 non-redundant ESTs, Plant J.,9, 101-124.

20. Lin, X., Kaul, S., Rounsley, S. et al. 1999, Sequence

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 5: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

No. 1]

and analysis of chromosome 2 of the plant Arabidopsisthaliana, Nature, 402, 761-768.

21. Mayer, M., Schuller, C, Wambutt, R. et al. 1999,Sequence and analysis of chromosome 4 of the plant Ara-

S. Sato et al.

bidopsis thaliana, Nature, 402, 769-777.

35

22. Lowe, T. M. and Eddy, S. R. 1997, tRNAscan-SE: a pro-gram for improved detection of transfer RNA genes ingenomic sequence, Nucl. Acids Res., 25, 955-964.

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 6: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

36 Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 7,

K11I1 (51529 bp)

l i m n ! • • ! • ! i MI ii inun HIMlit ii II I I ii ii

II II Ii I I1 2 3 ^ 4 5 ^ 6 7 8 9

I I

| | I I II I II

Grail exon

Protein db hit

ESTdbNt

Gene

Gene

ESTdbhit

Protein db hitGrail exon

deduced genes

iJeKlKlKlKlKlKlKlKlKlKl

itifier11.111.211.311.411.011.011.711.811.911.10

Kim.11K11I1.12

Kl 111. 13

PositionDirection 0"

+ 1092+ 5022+ 0400+ 7211+ 118984- 14294+ 20763+ 22441+ 28047+ 32044+ 39319+ 47678

48788

3' Exo44835999C743

1113513988189442141525433293773C8004389C48309

51478

103151C252CC3

0

EST

161101001110

0

Sequence IDC53 tJ133 r93

1122 y097 fi

1100 r192 M858 «417 *

1353 v1298 r

138 v

761 r

]|3914654|*pi|5810996|ei

i|0791483|ei

P74035blCAB53C51.1

>b|CAB53027.1i| 4263522|«b| AAD15348i|5791483|eii|5791483|e.i|5791483|eni|0791483|eui|0791483|eii|5791483|ei

ib|CAB03027.1ib|CAB53527.1ib|CAB53527.1b|CAB53527.Hb|CAB03527.1ib|CAB53527.1|

i|0042402|Kb|AAD38289.1j

i|5810991|ei ib|CAB03646.1

Overlap

132

1112657

1091185820292

11061241

118

702

Identity

96.2

00.838.954.408.654.138.251.348.047.1

68.8

Definition

AL110123) ribo.ou

AL110110) puAC004044) byAL11011G) puAL110110) puAL110110) puAL110110) puAL11011G) puAL110110) pu

at>o

atatatatat

iv*lie

ivt

i v tIVf

IVe

IV

at vAC007789) putativ

Oryza aativapartial) (AL110123)ike A- tlialiaiia

ill prote

proteintical prtproteinproteinproteinproteinproteilprotei.

e ABA

multidr

»K

Jte//;/j

iii

Js

protein RIMML32-like protein A. tfudiai,.

. tual/'ariain A. thaliana. thaiiana. thaliana

Umlianathalianathaliaim

. Mia/iwiaduced plat.il!* lneuibrane protein

re^anceprotein/r^^protei.

K14B20 (40251 bp)

mil

2

t

! I1 1 I

1

451 1

l l t t lBI

•6

1 17

II 1 I

II •in ii

1 1 Illl

I

n it

MUMIlll III II till

Grail exon

Protein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hitGrail exon

No. of JTcTof LengthExon EST Overlap

Identity DefinitioL021G84) LRR-like(AL021G84) predicted(AL021G84) predicted

(AL021C84) predicted(AL021084) predictedIAL021084) DEAD biA. tlwlimui(AL021084) predicted pi(AL021084) predicted p.(AF002078) kinesin-like(AC002340) 3-bydroxA. thalia/,ajpartial]

K14B20.1K14B20.2K14B20.3K14B20.4K14B20.0K14B20.6K14B20.7

K14B20.8K14B20.9K14B20.10K14B20.11

K14B20.12

3048444083249826

1243215059

18326210072710333564

422354178783

103221444917972

20072223883312535892

11226

12

31

2114

37021 40019

3207702

477033

432444

1260378

727

gi|2827704|emb|CAA10G77 169 90.0gi|2827703|emb|CAA10676 231 89.7gi|2827702lembiCAA10670| 325 100.0

gi|2827701|emb|CAAlG674| 42 69.8gi!2827701|emb|CAA16674| 376 89.7K"]i2827700|emb|CAA16673| G32 82.5

gi|2827699|emb|CAA16G72 431 84.7gi|2827699|einb|CAAlGG72. 435 90.1gi|2224925 1259 93.4Ki|2880043|gb|AAC02737.1i 371 83.9

i A. tfiWiai A. «,ali.

tiial/atlutlia

tin A.jtem•obuty bydrola?,

Figure 2. Gene organization in the 60 PI and TAC clones. Positions of the identified or predicted genes in each insert of the PI andTAC clones are schematically represented by color-coded boxes above (rightward) and below (leftward) the wide line in the middlewhich represents the entire insert sequence. The length of sequenced region in each insert was given in parenthesis together withthe clone name at the top. The names of the adjacent overlapping clones of which sequences had been reported are shown on themiddle bars. Arrowheads indicate the directions of the DNA strands (5' to 3'). Dark and faint blue bars with numbers representthe positions of the assigned potential protein coding genes, and pseudo and partial genes, respectively, and red bars the positionsof RNA coding genes. Gray bars indicate the positions of the regions which matched to the Arabidopsis ESTs. The regions whichshowed similarity to the sequences in the protein database are shown by yellow, orange and red bars, each of which correspondsto BLASTX scores of 70-100, 100-250, and 250 or more, respectively. The green bars indicate the positions of the potential exonspredicted by the Grail program. Each of three different colors with increasing depth corresponds to the regions with Grail scores ofless than 70, 70-90, and 90 or more, respectively. The potential protein and RNA coding genes assigned as described in the textwere listed below each of the figures. In this table, the number of amino acid residues and nucleotide length (in italic) of putativegene products of the respective potential protein and RNA coding genes are indicated.

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 7: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

No. 1]

K15O15 (23026 bp)

S. Sato et al. 37

II

2

inn

II!5

!•..•• J . , 1 - .

3 4

I

GraUexon

Protein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hitGrailexon

deduced genes

identifie.K15O15K15O15.2K15O15.3K15O15.4

K15O15.5K15O15.G

Infoi i the: ID

0 290 gi|5032279|*b|AAD38227.10 559 gi|30C3e91|einb|CAA18582.1|0 C370 301 ui|4C78207|8b|AAD2C953.1

2 821 gi|4104931|gb|AAD02219.1|0 53C ui|5732068|gb|AAD489C7.1|

rlap Identity781 [partial] (AF1472C4)

.9 (AL022537) putative; a p*eudogene A. thaliai A. tlutluuia

12983573212895

1502719493

110434

1904021815

275 40.2 [paeudo] (AC007134)tmnocriptH*e A. blialiun

774 93.5 (AF042190) auxin re«p271 80.9 (AF147203) contain* «i

cay transacting factor;

nwe factor 8 A. thaliauunilarity to nou.eu.e-mediated mRNA de-

K16F13 (19742 bp)

1 2 3 4

II I

GrailexonProtein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hitGrailexon

dedi

id«iit

iced genet

lfier D:irectioJ'ocition

0 3"No.Exc

of>u

No. ofEST

LeiiKtlIi IiSi

l i t

eq>niiHti on oi

IDii the i.uost dimlilar ceq

U v euencerlap lde nt:i t y Deri nitioii

K16F13.1K16P13.2K1GF13.3K16F13.4

K16F13.0

20594140 G623

11901 1322810206 17203

18298 19742

000

749420223

Ki|4iy3382

Ki|224489G|emb|CABl0318.1|Ki| 132021|*p|P2576G|

423210

63.483.9

(Z97338) H1

rat.-related

i S27 A.

; protein A. tliuliaiiuRGP1 (GTr-binding reguUtory |

RGP1)[partial] (AE000478) ORF, hypotljeticul protein Bicherichi;

K16E1 (33963 bp)

III IIIH Hi I

I

. .

4 5II

I I II I Ii I

GrailexonProtein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hitGrailexon

Identity Definition(AF064257) Dhml-Iike protein Hoiuo wpieiut(AF019630) patho^enicity protein Ma^naporthe grisea(AC002131) Contains similarity to BAP31 protein gb|X81816from Mus muyculuf. A. thalinua(Z97338) cytochrome P400 like protein A. th*li*n*(AC002340) putative cytochrome P400 A. th&li*itu(AC002340) putative cytochrome P450 A. tiiaJJana

K16E1.1K16E1.2K16E1.3

K16E1.4K16E1.5K16E1.0

504 61710713 75228521 9957

13907 1048810040 1822024211 20631

220 «i|4102999320 «i|3107943

252 Ki|2244893|einb|CAB10310.1|499 Ki|2880054|Kb| AAC02748.1|497 Hi |2880054|gb| AAC02748.11

697133150

491465465

47.429.947.0

58.300.455.4

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 8: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

38 Sequencing of Arabidopsis thaliana chromosome 5

K18B18 (35896 bp)

I I I I i l l II II I

i3 5 7

[Vol. 7,

1 2 4 6 8 9

i! ail

i i i i • i

GraiexonProtein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein dbNtGraiexon

deduced genes

identifier Identity DefiniteK18B18.1K18B18.2K18B18.3K18B18.4K18B18.5K18B18.0

K18B18.7K18B18.8K18B18.9Kl8Bl8.ll)

911C118921C7721937419640

20405228022729634122

962512344176871944519945

22494235713014435870

10O00

010134

170151220

72102

521135707447

Ki|5921832|,p|Q39005|Ki|5541669|emb|CAB51175.11

AB005781gi|4078207|gb|AAD20953.1

8i|4582486|einb|CAA16923.2|xi|5541071|emb|CAB51177.11l!i|5541675|emb|CAB51181.1|Ki|5541C74[emb|CAB51180.11

12578

72101

433100C96444

100.059.5

100.034.3

43.550.564.867.9

(AC0045C1) unknown proteincopper transporter 1(AL096859) copper trwiwporte protei

tRNA-Asp(GTC)[p*eudo] (AC007134) putative non-LTRtrau*criptide A. tluduuu*(AL0217C8) putufive protein A, th*li*it*(AL0908&9) hypotheticwl protein A. tludit(AL096809) ^ubtili^ii-like pvoU:]n^- hum(AL090859) aubtiliwu-likg prottiiuwf Loin

muloK A,

retrolelcmeut

Ann, A. thalliu>log A. thaJiw

K22J17 (11211 bp)

2 34I

GraiexonProtein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Proton db hitGraiexon

deduced genes

identifierK22J17.1K22J17.2K22J17.3K22J17.4

Dir

-

l'onitiuitti 5"

9660217821

10341

3 'C40

70869731

11177

No.Exon

2

11

No. ofEST

0010

147219637279

Sequence ID«i|2129515|pir

Ki|2827705|emKi|2827704|eu,

||S71174

b|CAA10C78|b|CAA10677|

Overlap

629256

;e

Identity

94.998.4

Definition

(AL021684) predicted protein(AL021684) LRR-like protein

A. tiudijt/m

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 9: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

No. 1]

K17N15 (81293 bp)

S. Sato et al. 39

•I •II III

I 111 i! mill! I l l

11n 12

I [ I III

II I

I! II, M II I II

I B

13 M e

rii i at n ». .«. ..in

Grail exonProtein db hit

ESTdbhft

Gene

Gene

ESTdbhft

Proton db hitGrail exon

deduced genes

identifiK17N1K17N1

K17N1K17N1K17N1K17N1K17N1K17N1K17N1K17N1K17N1K17N1K17N1K17N1K17N1K17N1K17N1

K17N1K17N1K17N1K17N1

r Direction.1 +.2 +

.3 +

.4 -5 _.0 +.7 +.8.9 +.10.11 +.12 +.13.14.15 -.10 +.17 +

.18 +

.19

.20

.21 +

Position5

8415802

9008122581C507200872455525090300403837144382480805102055441590390415200322

08044739797078178825

3'28047912

117251418118413213902510030077308153938147185498005173057504020550537700740

73272709037782481028

No. ofExon

114

92201

1219

14515

1142

7330

No. ofEST

00

022300000302012

0000

Length

385581

592530540170204

1149800337C80292239419707182

78

1239501200337

1 kfomiatioii on the moot similarSequence IDKi|2944178gi j 2829890

g

gg

gggggggggg

g

K

g

|4454012|emb|CAA23000||0541707|emb|CAB51212.1|5541707|einb|CAB51212.1|

|4454013|emb|CAA23000||5903057|gb|AAD55C10.1|5174507|ref|NT .000923.1|4084340|gb|AAD20141.1|2583120|gb|AAB82029.1||0094555|gb| AAF03497.1|10400801 |gb|AAF13032.1||4454020|emb|CAA23073||4522009|gb| AAD21782.1||400983|»p|P311C4|3249066

|5302803|emb|CAB40044.1|5123925|emb|CAB45513.1|

g |170C101|«u|Q105C9|

sequenceOverlap

384570

533501505

1971135055300050282

-15238748815134

930490

278

Identity100.0G1.5

82.800.0G0.1

03.5C0.231.055.058.907.022.900.442.908.082.9

50.300.2

38.4

Definition(AF007778) trelialo(AC002311) highlyKp|X00033| 18591 A(AL035390) Pollen-(AL0908C0) pectiue(AL090800) pectine

(AL030390) putativ

e-G-phosphate photiplmtrt.se A. tfiaJiaiMiaimihir to nuxin-regulated protein GH3.

thaliatta••pecific protein precursor like A. thtdiiUiHutertwe-like protein A. thulium**ter»*e-like protein A. thulUn*

e protein A. thulimia(AC008010) F0D8.33 A. thalianamitochoudrial inter(AC007127) uukuo.(AC002387) putativ(AC010070) unknow(AF113G1G) iutestii(AL03039G) putativ(AC0070C9) unkno.00S ribosomal prote(AC004473) SimilaESTs gb|F15433A. thalia/ia(Z97342) disease re(AL079350) putativ

[partial] cleavage a

nedijite peptidn^eii protein A. thaJUnue receptor-like protein kinttue A. thuliaiian protein A. thaluttui»t mucin 3 Homo oHpu-m,e protein A. tltnliniinn protein A. thalituiuin L l l . chloropltwt precursor (CL11)

to S. cerevi^e SIKlP protein Kb|984964.Hlid gb|AA390158 come from tin* ge»e.

wtHiice RPP5 like protein A- thaliai,*e protein A. th*li*i,n

id poly»denyUtion specificity factor, 160 kd»ubunit (CPSF 100 kd subunit)

K17O22 (67720 bp)

1 • • = • - :

I Hii l l n • • • II

ii HI

11 O M B

I III

"I•+•

II III II I II • III I I II I

Grail exon

Protein db hit

EST db hit

Gene

Gene

ESTdbhH

Protein db hitGrail exon

deduced g . n «

identifie. ID Ov,,b|CAA18120|Lb[CAB53784.1

rlap Identity n.t,.:t;».[partial] (AL022141) putative dis ^sistance protein A. thaliana

protein rps4-RLD A. thaliana

ity to TMV resistance protein Ngb|U15000 from Jftctuuu giutiuoia. A. thahana(AC007918) Similar to gi|42G3048 TCA13.8 putative En/Spmtransposon protein homolog (mosaic protein) from A. thalianachromosome II sequence gb|AC000250.(AF128394) contains sirniiarity to Petunia PTTA' (GB:AF009510)A. thaliana[pseudo] (AF128394) similar to Antirrhinum majus (garden snap-dragon) TNP2 protein (GB:X07297) A. thalia/ia(AL02448C) lectin like protein A. thaliana(AL024480) putative protein A. thaliana(pseudo] (U27090) Fe(II) transport protein A. thaliana(AL02448C) putative protein A. thaliana[pseudo] (AL024480) putative protein A. thaliana[p»eudo] (AC007259) Hypothetical protein A. thaliana(Z97341) hypothetical protein A. thalianaras-rel»ted protein RHA1

K17O22.1K17O22.2K17O22.3K17O22.4

K17O22.5

K17O22.7

K17O22.8K17O22.9K17O22.10K17O22.11K17O22.12K17O22.13K17O22.14K17O22.15

28348098

12374

2001G8C0

1015814003

10292

18489 22319

230&239 565431125011052770547020179905291

257904053445742521275442158770032710C853

7231100354392

Ki|2901373|.gi|5823585|.gi|4203705|gb|AAD15391.1gi|3335340

305 «i|0272382|gb|AAF00088.1

0 108 gi|4325349|tfb|AAD17347

0 1277 ui|4325301|gb|AAD17349|

gi|3250093|emb|CAA19701.1gi|3250079|emb(CAA19087.1Ki 13532GGgi|3200070|emb|CAA19C83.1|gi|3200C74lemb|CAA19082.11gi|5734730[gb|AAD50001.1gi|2245012|emb|CAB10432.18i|400970|sp|P31582|

0000000

332294877593001

1308491200

7141112212380

292

840

1002931890124221320390199

05.053.1

90.2

40.2G8.0GO.372.301.044.138.0

100.0

(AJ249203) dis.(AC000223) putative dis.

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 10: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

40 Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 7,

K1L20 (47665 bp)

i iiiii i

I ilti! Ill1 2 34 6

• II I12

III

•i linn II ii ii i

Grail exon

Proton db hit

ESTdbhft

Gene

Gene

ESTdbnK

Protein db hitGrail exon

deduced get

identifier

l e e

b l r ectioiPosition

:i 5 3 'No.Exo

of No. ofEST

Lengthi Iyiifoneque

nationnee ID

on the iuost f imiktr sequelOverl*

• ICC

LP Idei.itity Ii -h mtiyliK1L20.1K1L20.2

K1L20.3K1L20.4K1L20.5K1L20.CK1L20.7K1L20.8K1L20.9K1L20.10K1L20.11

K1L20.12K1L20.13K1L20.14

00317800

10091130101400110089221003279237110

390834129840817

09048236

12147140971C022178282C8803389038308

397314272447427

224103

1421

324

10033001

1

000

1 0 0328

30699

4 4 9

3 0 1

2022 9 2

1037339418

89331147

K 6002270gi[1086086

«

|4400819303681016002270[0319186

J2901370006226710062200

euib|CAB62640.1|

Kb|AAD20127.1|eml>|CAA18000|enib|CAB62C40.1J8b|AAF07199.1

emb|CAA18122emb|CAB02637.1eijjb|CAB02030.1

|46C2629|t!l,jAAD26901.1|

49298031Kb, AAD34102.1 [12029680 Kbl AAC62808.il

270

30484

427360

2 8 09 2 4

282322

330-03

00.131.7

03.868.244.2

100.0

00.702.400.930.0

70.200.0

] (AL132980) liypotheti^l pri(U41007) »imil»r to G beta rePe»t» (PROSITE:PS00670)Ceeiivrhubditw elegiuiuIAC006201) unknuwii prottin A. tlmlitmti(AL022373) putative auxin-induced protein A. thuliuna(AL132980) hypothetical proteiu A. «,«««,.«(AF 1901 40) GDr-D-mauno»e 4.0-Jeliydrata.e

(AL022141) NAM like protein A. tlmliiuntIAL132980) putative protein A. thaliana(AL132980) traiwcriptiuu factor-like protein A. tUlwn(AC0072C7) putative leucine-rich repeat disease reciftance proteinA. tluiluuia

(AF102000) putative ainc fingt[partial] (AC002030) putativA. tlWi.™

!iu SHI A. tietl,yladeno

K1O13 (25275 bp)

I I III III I fl. ii

5 7 8 9 K

• • • • • IVWC6

i i

•II H I I I

Grail exon

Protein db hit

EST db hit

Gene

Gene

ESTdbhH

Protein db hitGrail exon

deduced gen«

identifieK1O13.K1O13.K1O13.K1O13.K1O13.K1O13-K1O13.K1O13.K1O13.K1O13.K1O13.

r Direction

-

-

-

+0 *1

0

6023742000073090086362237180621841010932726

3' Exc3008002969008991

13006130931010916810190802218820270

i i

8371

10132228

EST20204000000

090270242061487

72326280287287408

Sequence IDKi|4218144lenib|CAAl 0602.1Ki|6033849|nb|AAF19708.1Ki|4218144|emb|CAA10002.1Ki|0C33849|xb|AAF19708.1Ki|6633801|Kb| AAF19710.ilX72896

Ki|040C168|Kb|AAF09106.1Ki|6033800|Kb|AAF19719.1Ki|04001C8|Kb|AAF09100.1iKi|C456171|iibiAAF091O9.1

Overlap243

0G241484480

72

233233221407

Identity88.002.798.806.484.090.6

46.242.347.370.9

Definition(AJ132398) Klutathioue transferase. GST 10b A. thaliana(AC008047) F2K11.17 A. thaliaua(AJ132398) Klutathioue tranoferaye. GST 10b A. thaliaim(AC008047) F2K1I.17 A. tlujiarm(AC008047) F2K11.13 A. thaliauatRNA-Cy.(GCA) A. thaliaua

(AC011022) unknown proteiu A. thaliaua(AC008047) F2K11.4 A. thaliaua(AC011022) unknown protein A. thaliuua[partial] (AC011022) kiiieoin-like protein A. thaliaua

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 11: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

Xo. 1]

K20J1 (36243 bp)

I II I II Mil HI

3

12 9 V

I I I

:P3

S. Sato et al. 41

Grail exonProtein db hit

ESTdbhH

Gene

Gene

ESTdbhit

Protein db hitGrail exon

deduced genes

identifiei n r r Overlap Identity DefuJtTcK20J1.1K20J1.2K20J1.3K20J1.4

K20J1.5K20J1.0

K20J1.7K20J1.8K20J1.9K20J1.10K20J1.11K2OJ1.12K20J1.13

128411103100

53308069

12000140002310020200302063344035500

44921054497

71018580

13242202742308425G18318743432130243

1 96 X159330 166 X531750 350 gi|0067172|dbj|BAA88308.1!0 464 gi|5001734|6b|AAD37122.1

2 G09 gij0731761lemb!CAB525C2.10 116 Ki|304708G

Ki|6573707|gb|AAF17G87.1Ki|1871179|gb|AAB03539.1*i|4972060|enib|CAB43928.1Si[3757510|)!b|AACC4218.1gi|4850290|en,b|CAB43052.1

2000000

39C1487

150139502294210

96160102422

47100

70112484291190

100.0

41.1

50.8

83.3

09.2

4C.5

09.0

39.8

40.8

54.5

[partial] U4 snRNA P'&uuVI snRNA A. thalian*(AB028860) mDjlO M"« /(AF129M1) very-loiim-clidA. th*li*na(AL109819) extensin-like(AF058914) Minila.script Jact.himii. sec

•"fatty

: 72.31) A. thai™

(AC009243) F28K19.24 A. tli.Iiana[pseudo] (U90439) hypothetical protein(AL078470) putative protein A. thaliuuu(AC005107) putative disease resistance[partial] (AL049870) RPPl-WsA-like cA. tluJiuu

K21H1 (74342 bp)

ii i 11 11IIImii!i II

i if23 4 6

Illl I I

I17 19I •

t l

I • • • • • I • •7 8 10 tiO M B 16 18

II I

II III III SIS III ii mi i

Grail exonProtein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hit

Grail exon

deduced g

identifier

enes

Position

Direction 5' 3"No. of

Exon

No. of

ESTLen^tl.L I.

Siiformatii

equence lb:i the most similar sequt

Overl•nee

up hientity Dehnitioii

K21H1.1K21H1.2K21H1.3K21H1.4K21H1.5K21H1.0K21H1.7K21H1.8K21H1.9K21H1.10K21H1.11

K21H1.12

K21H1.13K21H1.14K21H1.15K21H1.16K21H1.17K21H1.18K21H1.19K21H1.20K21H1.21

777290101422215010184902083129329332353550837379

39199

412444301800034053920051501775640020595373176

8494100021020317817205972783431233352223070938722

40000

432504543051180080510140503422650740959474339

130

0•>.

00100100

2411293447361492210272300228448

Ki|82777|pir(|A34959Ki|0523044|emb|CAB62312.1Ki|0050412|(!b|AAF02876.1gi(0523042|emb;CAB02310.1gi|1086249|pir||S52709Ki|0010010|sp|O48603|gi|4000880|enib|CAB10798.1t!i[2911008iemb;CAA17068Ki|4006878|errib|CAB10790.18i|5091501|dbj|BAA78730.1gi!6523039|emb|CAB62307.1

434 Ki|0023039|emb|CAB62307.1

382302184051317423492782388

gil0300263|gblAAD41995.1*i|1351940|sp|P47927«i|0023037|e,nb|CAB02305.gi|0523034|emb|CAB62302.«i IG523033 lemb ICABG2301.(!i|C3235G0|ref|Nrj013G31.1Ki|400G882|emb|CABlG800gi|6522931|emb|CAB62118KiiG022929)einb[CABG2110

239833430901193

11686200120442

51.7

45.2

70.2

45.6

00.2

59.8

42.5

06.7

04.3

57.3

131330163G33310298454423

03.0

34.850.008.9G7.881.045.206.454.062.7

(AL13297C) putative protein A. thullaii*(AC009520) Unknown protein A. thaluum(AL132970) protein kinawe-like protein A. thahanasuUilisin-like protege - Alnut* glutittoaaDNA polyinerHye alpha, catalytic subunit(Z99707) putative protein A. thalian*(AL021961) putative protein A. thuliana(Z99707) MAP3K-Hke protein kina*e A. tlmliaiia,(AB023482) Hypothetical protein Oryza sxtiv*(AL13297C) anthranila.te N-hydroxyciimamoyl/bei^ilike protein A. tlialituiu(AL132976) anthranilate N-hydroxycinna.itiuyl/benz<like protein A. thaluui*(AC006233) unknown protein A. thttl'uuutfloral hoKieotit protein APETALA2(AL13297C) putative protein A. thalian*(AL132970) receptor protein kinase-like protein A. ,(AL13297C) putative protein A. thuliumYmlOSOwp(Z99707) UDP-glucuronyltrwinfertiwi-like protein A.(AL132978) hypothetical protein A. tlmliiui*'partial] (AL132978) putative protein A. thalian*

yltrau

yltran

lia/iaii

tluJia

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 12: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

42 Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 1,

K21L19 (41087 bp)

ii m mi mi H I in mil m a n HI

i n %3 5 011 12 14

2I I I

III II

1 I

i IHi I

Grail exonProtein db hit

EST db hit

Gene

Gene

ESTdbhH

Protein db hitGrail exon

deduced genes

identifieiNo. LExon

No. ofEST

InfoSequel

K21L19.1K21L19.2K21L19.3K21L19.4K21L19.5K21L19.GK21L19.7K21L19.8K21L19.9K21L19.10K21L19.11K21L19.12K21L19.13K21L19.14

12473080871C

12492147301600018424197742204329490317073540838986

8012G987905

116011434010371183C518992214082907530691351393798140926

88862622

23644r.

1004

19

a0038030

287192

1189703199186632100477932196

1046748403

Ki|0063236|gb| AAF17212.1|Bi|4909712|«b|AAD34459.1|8i|4836896|(fb|AAD30099.1|Bi|4455192|euib|CAB36015.1|«i|4097561Ki|2497702|«p|Q46036|t!i|2827643|emb|CAA16097.1|

gi|0223646|8b| AAF05860.1|

Uil4186184lmblAAD09623.llgi|640G158|Kb|AAF09146.1|8i|4322670lgb|AAD16120|gi|539M42||jb|AAC27»3.2|

27.7 (AF111168) unknown Hc-uiu «.pien»990 650 (AC011622) putative di*etu>e resistance protein A. tlialiana473 20.0 (AF094008) d«ntin phosphoryu Home- „,«„,«402 96.8 [partial] (AF053941) uoii pljototropk hypocotyl 1-like A. thaiiaiia

K22G18 (45453 bp)

it: f il mil I All IIM I H 1

4 6 7 8

Illl III I

HI KMlit

II II I I

MTG10

Graft exonProtein dbhit

EST db hit

Gene

Gene

ESTdbhtt

Protein dbhRGralexon

deduced genes

identifierK22G18.1K22G18.2K22G18.3K22G18.4K22G18.0K22G18.6K22G18.7K22G18.8K22G18.9K22G18.10

K22G18.11

Uirecti.

__+_+

+_

+

jti 0'

8294098899C

1301714435159151G342217532719137107

42659

3"19317337

11928138121585210211209012C2773144241263

45453

Sqgi| 1707016|gb|AACC9127.11Ki|2342690xi|2702273|gbj AAB91976.1|«i|4204281d!i|5042416|gb| AAD38255.1|

ni|59O3057|i(b| AAD55616.1|ni|5903057|n;b|AAD5561G.l|gi|2443329|dbj|BAA22374|gi|932|«<ib|CAA37773|

lentity Uefanition36-9 (U78721) putative AP2 domain transcription factor A.60.3 (AC000106) Similar to Homo copme I (,(b|U83246). A.47.7 IAC003033) unknown protein A. tLiJimia24.6 (AC004146) Hypotl.etical protein A. llialia,,,40.8 IAC006193) Unknown protein A. thaiiona

64.3 (AC008016) F6D8.33 A. tfiajiwia02.4 (AC008016) F6D8.33 A. thaiiana88.5 ID8C122) Mei2-Uke protein A. tlialiaua32.3 (X53744) 68kDA xubunit of signal recognition

40.3 [partial] (AJ130878) GCN4-coniplementing

24857877623839799

11341112884005389 (GCP1)

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 13: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

No. 1]

K24M7 (73999 bp)

S. Sato et al. 43

I I m m i• •i in

3 4

UN Illl II II I I l l l l l l l l l II i RII 11

7 8

I

* B«17

• M lII i l l

22 2426

• I!II 111

intoII

1 II III I !

IllI

2021 23 25I

III II HI I I '

Grail exonProtein db hit

EST db hit

Gene

Gene

EST db hit

Protein db hitGraiexon

deduced g

identifier

enes

D!rectioiiPosition

0" 3"NuEx

. of No. ofEST

LeUKtli InfonSeque

nation oinee ID

it the inost »i:iiiiltir sequOvei

enceUp Uentity Dehi:litioii

K24M7.1K24M7.2K24M7.3K24M7.4

K24M7.5

K24M7.6K24M7.K24M7K24M7.K24M7.K24M7K24M7K24M7

K24M7.14K24M7.10K24M7.10K24M7.17K24M7.18K24M7.19K24M7.20K24M7.21K24M7.22K24M7.23

K24M7.24K24M7.25K24M7.2C

20778920

12050

10835

1932022523200322955529881322823324030005

4407047702510305353854817580176070004099CC01108042

705787200173296

82751112215044

18340

2145325010293302903030357329243478236894

47227510115296854263507000033163470654796822769483

719577268473994

535 gi|320776|pir| |S30782

912

1009017710

24.9 [partial] i «0

047«2

130142208190

530790239242438200486331308241

Ki|1170839|.p|Q04980|Kijll70840|»p|Q00738|

002 «i3335303

Bi 2341042 Kb AAB70440gi|4087550lgb|AAD25781.1gi|4587000|Kb|AAD25781.1X72897Ki|2980791|emb|CAA18167.1|

Ki|101752Ki|2129950|pir||S02700

Ki 4204278Ki| 1052971 |dbj|BAA17888|Ki|4335737|nb|AADl 7415.1

Ki|2980788|emb|CAA18104.1|Ki|4894914|Kb| AAD32C52.1|Ki|3337307|Kb|AAC27412.1Ki|2060G60Kil6003C81lKblAAF00M2.llKi|5732OO0|Kbj AAD489C5.1|

Ki|4914450|eXI4902

b|CAB43094.1

365003408

82129

212107

514244210

404259400270

24184

83 285.4

70.4

03.830.730.0

100.090.8

28.601.9

30.534.747.9

60.784.276.441.099.603.6

75.694.0

low-temperature-iiiduced 65 kd proteinlow-temperature-iuduced 78 kd protein (ileyyication-renpuiiniveprotein 29B)(AC004512) Similar to cytoclirome P450 K"|X90458 from A.timJiiuia. A. tluduuin(AC000104) F19IM9.26 A. (Juliana(AC006577) EST Kb|R64848 tome, from tbi« Kene. A. tlialiaiu(AC006077) EST Kb|R04848 comes from thin xene. A. thalijtnatRNA-Ser(TGA)(AL022197) actin depolymeriKinK factor-like protein A. thallium

(L03710) cnjB Tetroliyweu* ther/nopiiilaphotoaffiiiiilate-responsive protein PAR-lc precursor - common

(AC004146) putative Cytwhrome P450 protein A. tluduuw.(D90910) hypothetical protein Sytu^hucyvtin up.[p»eudo] (AC00C248) unknown protein A. thuliHiiu

(AL022197) putative protein A. tlutliMiiu(AF139188) HCFlOfi A. tlntluuia(AC004481) hypothetical protein A. tiinluui*(AC002342) hypothetical protein A. tlurfitui*(AF187871) fibrillarin homolog A. tliaJiaua(AF147203) contains similarity to Mediuago truncatula N7 protein(GB:Y17013) A. tludiaiu.(AL050400) fibrillarin-like protein A. thxlut,,*tRNA-Leu(CAA)

K2K18 (41465 bp)

m

•4 "s

.•lilllf

:'f It•11 m 1 ii

1 HII II ! II I I

Grail exonPratoin db hit

EST db hit

Gene

Gene

EST db hit

Protein db hitGrail exon

deduced genes

identifierK2K18.1

K2K18.2K2K18.3K2K18.4

K2K18.0K2K18.6K2K18.7

K2K18.8

DinKCtlOIl

-

+

_

0"1461

74851330110777

214812771030523

39220

3'4740

103521520519125

230092970037302

41321

Exo.9

4

9

19

4

EST0

00

15

000

0

662

825404246

409CGI

1400

570

Uverltip Idei«i|4890180|8b|AAD32773.1| 407 63.0

Ki|2505011|gb|AAB81881| 052 36.9Ki|4006829|Kb|AAC95171.1| 4C3 73.9Ki|2511088|emb|CAA74020.1| 244 100.0

Ki|4193320 402 08.0Ki|4086021|Kb|AAD20640.1| 77 79.0Ki|4263831|Kb|AAD10474.1| 373 81.8

Ki|4203830|KblAAD10473.1 383 02.3

(AC0070C1) tfiinilnrA. tliaJuum(AC002983) putative MuDR-A-like tIAC005970) uutHtive protein kiiiH*e(Y13C91) uiulticHtHlytic eiidopeptidnponent. ttlpIiH subunlt A. tltalittiut(AF045473) histone dencetylaae Zea tnayu[pt-eudo] (AC007170) cytoplasm^ wtoiiitnte hydrtit[pseudo] (AC006067) putative retroeleinent poA. thaliuia|p»eudo| (AC000067) liypotbetical protein A. th«li»

from Nkvtituut IHIHUUIH

ponuu protein A. tlmlitutu

•mplex. prote»»o..ie com-

1

! A. tlUllUt

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 14: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

44 Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 7,

K24C1 (29498 bp)

m

"e \

I I I I II II

Grail exon

Protein db hit

ESTdbhit

Gene

Gene

ESTdbhtt

Protein db hit

Grail exon

deduced

identifier

genet

1>

i

i ructionPosition

3'No. ofExon

No. ofEST

LeuKth Infori,Seque

rmtin t e

on 01

11)

ii the movt *imiUrOve rLp Ideiitlty Defiulition

K24C1.1K24C1.2K24C1.3K24C1.4K24C1.0K24C1.6K24C1.7

29007108

11058123782120827839

46001007G1101010318241002833C

gi|1786140[dbj|BAA19113| 346509 Ki|2982461|emb|CAA18225.1] 424973 Ki|34:>10C9|emb|CAA2046S.l| 918180 «i|3953470 1079498111G6 gi|56C8608|Kb|AAD45979.1| 165

28.0 [ptirtml] (AB000454) PEThy40.0 (AL022223) putative protein A.37.1 (AL03132C) hypothetical protein39.2 (AC002328) F22O2.21 A, th*li*ii

34.3 [AF110334) MenG

K2N11 (30340 bp)

HI III •H'i !

II I

MR A!

4 5 67 !

IIi i aimm in HIM

Grail exon

Protein db hit

ESTdbhit

Gene

Gene

ESTdbhtt

Protein db hit

Grail exon

deduced genesIiifoi nation )»t S

identifierK2N11.1K2N11.2K2N11.3K2N11.4

K2N11.0

K2N11.0K2N11.7

K2N11.8K2N11.9

L)irection

+

+

0"1

66871049611380

10481

1839020913

2244020302

3 '01309907

1090813947

17032

2061321801

2404929303

Sequeqi|2832632|emb|CAAl 0761.1

921 gi|0791481|eiiib|CAB53025.1|104 gi|4836929|gb|AAD30031.1797 gi!0732431|gb|AAD49099.1

308 8i|4830700l8b|AAD30233.1

262313 gi!4044460|gb|AAD22308.1

2 268 ui|4400818|gb|AAD20126.10 1334 Ki|0734736lKb|AAD00001.1

701 731) (AL021711) hypothetical protein A. thulinna834 80.1 (AL110116) putative protein A. t/iahaii*.90 77.1 [p«eudo) (AC006085) Hypothetical protein A. tlialiuu

043 90.0 (AF177030) contain, .iuiilarity to maize traiwpi(GB:M76978) A. (Indian*

189 43.2 (AC007202) Containtt similarity to Kb|AB01709tor (WERBP-1) from Nkotitum tulxtuuii,. E S T PKb|T41870. Bb|H38232 and nb|N38320 come fromA. thaliana

310 07.9 [p»eudo] (AC006092) putative non-LTR retroelement»cripta»e A. tl.aliana

163 61.6 (AC006201) unknown protein A. ll.aliai.a1310 43.1 (AC007209) Hypothetical protein A. thaJiana

Kb|H39299.

K5A21 (13874 bp)

I I I

I- -!—g

1 2I 1

i i HIIII i miai

Grail exon

Protein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Prateindb hit

Grail exon

deduced genes

No. of No. ofExon EST Overlap Identity Definitic

K0A21.1

K0A21.2

1 7141

9218 13792

22

18

3

4

773

621

Ki]113336|»p|ri7427

Ki|113337|«p|r>18484

709

566

41.4

38.3

partial] alpl.alph»-C lar«e

-adaptiu C (clathrin ac h a

neiiibrune tutnptorpHrtinl] tilph,lph»-C lar,-eiieinbraue ad

-adac h a

>ptor

n) (100 kd coatedHA2/AP2 adaptiu

ptin C (tlatlirin a11) (100 kd coatedHA2/AP2 adaptin

aembly proteivesicle proteialpha C Bubuseiiibly proteivesicle proteiialpha C «ubu

c)

•it)

CJ)it)

u.plex 2(plasma

ii.plex 2(plas.na

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 15: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

No. 1]

K5F14 (31178 bp)

in mi I I II iHim:.

II 1 1 I I

3 4 S 6 78 9 It l111 I I

1 . I - >hI H III III I III II

S. Sato et al.

Grail exon

PicAundb hit

EST db hit

Gene

Gene

ESTdbWt

Protein db hitGra« exon

ident

iced gi

ifier

ene«

Di rectiun 5' 3 'No.Exo

ofIi

No. ofEST

LeiitftlWUKti IL) O v i :rlfip ntity I)eh ilitiuii

K5F14.1K5F14.2K5F14.3K5F14.4

K5F14.5K5F14.CK5F14.7K5F14.8K0F14.9

K0F14.10K5F14.11

7451901913280

140211735522479232C824101

2882430195

402191711190013078

1509020081230332371120985

2931831178

74G gi|1170021|sp|P4G870234 gi|2244797|euib|CAB10220.1|030 ni|4539330|eiiib|CAB37483.1109 gi|703400

428 gi|491433G|gb|AAD32884.1|002 gi|491433GJgb|AAD32884.1|185 gi|491433G|t!h|AAD32884.1123 «iil402902|eiiib|CAAGG907|703 gi|377G5G7

Ki|112737[sp|IM5457*i|2190559

201434-81

32204012100

509

15785

37.038.934.1

43.344.932.871.237.2

48.705.1

n-like protei(Z97330) hypothetical prob(AL030039) putative proteiIL20329) multiple banded i

In A. tluilia/utI A. tlwluuui•itigeii Urmplmiu, nJj-tk-UIL

[p»eudu) (AC005489) F14N23.22 A. t luj iuw(AC000489) F14N23.22 A. theliuiia(AC005489) F14N23.22 A. tlialiuiajp.eudo] (X98323) peroi!ida.-.e A. tlialiaiw(AC005388) Strong umilnrity to F21B7.33 ni|28092C4A. tluJimia BAG gb|AC002500. EST tib|NC5119 come, fromgene. A. tiiuiiaftu2f ceed storage protein 1 precursor (2s albumin storage proti[p^eudo] (AC001229) F5I14.1C A. iiiaJiana

from, this

K5J14 (59762 bp)

I I HUB HI ! I III Mil m M l I M \• I 1

111 I HI I I1 ^ 2 3 4 S 6 7 9 11

I I I I

III

" s io

i • i i mi

Grail exon

Protein db hit

EST db l i t

Gene

Gene

EST db Fit

Protein db hitGrail exon

deduced g«i

ident fierK5J1K5J1K0J1K0J1K0J1

K0J1K0J1K0J1K0J1K0J1

.1

.2

.3

.4

.0

.G

.7

.8

.9

.10

K0J14.11K0J14.12

Direction 0*"~+ 0320+ 11437+ 20293+ 272014- 32031

+ 3720G+ 41203

44050+ 50244

52108

+ 5440457198

No.5 ^ Exo

753915990224082947230991

3877043G43400015057352370

0G77909389

of

1710109

10311

5

10

No. ofEST

0

004

20025

00

Length

204902409473390

007704080110

73

080370

Information on the most sinSequence IDgi|C098300[gb|AAF18094.1gi|3983139gi|4733981 |gb| AAD280C2.11Ki|4733981|Kb| AAD28CC2.1|gi|913445|bbs|lG0507

gi|3128187|gb|AAC10091.1|gi|5734790|gb|AAD50000.1gi|42042G9

gi|1820G40

gi|3289002|gb| AAC25099.ilgi|378C502

ilar sequencOverlap

183133404402374

490084032

71

022244

Identity38.038.873.478.870.9

80.381.203.7

70.4

04.720.5

Definition

(AF098011) Scythe XenopuB luevie(AC007208) putative serine carboxypeptidase II A. tluditui*(AC0072G8) putative serine carboxypeptidase II A. thalianit(S75487) alcohol dehydrogenase ADH=alcohol dehydrogenase ho-uiolog EC 1.1.1.1 LycopenaW. escutoitum(AC004521) putative beta-|(lucosidase A. tl.aliar.a(AC007980) ATP-dependent inetalloprotease A. tlialuiw(AC005223) 04111 A. Mujiaria

(U88173) weak similarity to A. thaUana ubiquitin-like protein 8

(AF073522) CRP1 Zea may.(AF098994) similar to ainc carhoxypeptldascs (Pfam:Zn_carbOpept.hmm. score: 259.73) CnenurhitfjJititf e/egane

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 16: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 7,

K6A12 (64136 bp)

Ill ! • • MM I • • II • • • I I I IIlil i =•

I Hi 1 I i2 3 4 6 7 « 1213 14 SMP-\ • ^ B • | • I • • • •

I I I I 1 I I II B U I ! I I I

GcaNexonPraieindbhit

ESTdbhtt

Gene

Gene

ESTdbhtt

Protein db hitGraflexon

deduced genes

identifierK6A12.1K6A12.2

K0A12.3

KGA12.4K6A12.5K6A12.CK6A12.7K6A12.8K6A12.9K0A12.10

KdA12.llK0A12.12KGA12.13KGA12.14K6A12.15K6A12.10

PositionDirection 5 ^

1+ 10553

+ 15871

4- 2583727949

+ 31333+ 37524

4079243356

+ 45972

47401+ 50320+ 52208+ 53959+ 56535

01210

No5 ^ Exc

109613300

20242

27G01298823225340424419214550747179

486805158053591562225801402850

ofn

59

9

6727244

433231

No. ofEST

11

0

0022000

110000

Length

251713

1027

340215210718357515269

289361423723335547

Information on the moat similaSequence IDgi|6175145 gb|AAF04872.1gi|2462833

gi 13850588

Ki|2129C98|pir||S01760«i4938485 enib|CAB43844.1|Ki|4220518|emb|CAA22991|Ki|3025124gi|5903000

sp]P74523|gb|AAD55615.1

lii|3122952|,p|O15730|gi|2499569|>plQ42539|

Ki| 1070305 pir||S53492Ki|2944440gi|0061974|emb(CAB62440.1gi|3004555|gb| AAC09028.ilgi|4914341|gb|AAD32889.1gi|6016698|gb| AAF01525.1

r sequenceOverlap

207080

1016

33313820713034944G13G

288360412480201511

Identity51.931.1

45.1

70.946.857.735.942.932.472.3

70.271.538.024.932.852.3

Definition[partial) (AC010790) unknown protein A. rhaliana(AF000657) highly similar to froha and frohb. potential frolicA. UuJIuia(AC005278) Contains similarity to gb|AB011110 KIAA0538 pro-tein from Homo sapiens brain and to pho.pholipid-biiidiiiK domainC2 TFI001G8. ESTs gb|AA585988 and gh|T04384 cuine from tin.Kene. A. chalianaprotein kinaae ATNl (EC 2.7.1.-) - A. thaliana(AL078404) putative protein A. thaJiana(AL035356) hypothetical protein A. (ha/wiahypothetical 17.7 kd protein slrl419(AC008010) F6D8.29 A. ihalianatipd proteinprotein-L-isoaspartate O-methyltransferase (protein-beta-aspartate inethyltransferase) (TIMT) (protein L-isoaspartylmethyltrausferase) (L-isoaspartyl protein carhoxyl methyltrans-

ferase)RNA-binding protein cp31 precursor - A. rlialiana(AF05075GJ cysteine endopeptidaae precursor Riciniw conmiuriia(AL132979) putative protein A. rliajiana(AC003673) putative salt-iuducible protein A. thalia/ia(AC005489) F14N23.27 A. thaiia/ia(AC009991) hypothetical protein A. thaiiana

K6M13 (77129 bp)

mill i i II

i1 2 3 5

I I •

i i i uim M I i n i 111

US 16II I

•1 II II

I • • • • •9 on eIII I f i !ii III ii i n II II i

Grail exon

Praieindbhit

ESTdbhtt

Gene

Gene

ESTdbhtt

Protein dbhttGrail exon

deduced genes

identifierK0M13.1K6M13.2K6M13.3K6M13.4K6M13.5K6M13.6

K0M13.7K0M13.8K0M13.9KGM13.10KGM13.11K6M13.12

K6M13.13

K6M13.14

K6M13.15KCM13.10

KGM13.17K6M13.18

Directio,+

+-++

_

-_

+

-

Tositionti 5"

15957

10473127041587420035

301943607939876418504610947219

58971

02745

0388105925

6659672639

3 '2G940436

11327153511759828009

303533938440754457474693051270

61958

03257

0469006437

67G7274529

No.E x t m

1011903

1133

171

17

9

1

21

13

No. ofEST

070G00

00512'•>

0

0

00

00

Length

380100285497195399

ICO689159647274721

G95

171

128171

359307

Information on the most similar secSequence ID 0"vgi|6553906|gb| AAF16572.1gi J2352828gi|2191196gi|1711510|»p|r49966|Ki|4507873|ref|Nr>j003363.1gi|2275204|gb| AAB63820.il

X53175Ki|6049274|Kb|AAF02535.1

gi|2749982gi|4835230|emh|CAB42914.1gi|4049518|emb|CAA21253|

gi|3152572

gi|6598399|gb|AAD03505.2|

Ki|42G2175|Kb|AAD14492|Ki|4337175|Kb|AAD18096|

gi|0598344|gb| AAF18592.1gi|3941510

erlap379148120490102179

50402

543262298

462

161

77142

280211

Identity83.2

100.034.794.044.247.2

91.525.8

50.065.840.5

39.5

51.9

94.941.3

25.689. G

Definition[partial] (AC012503) putative protein kinase A. tluiliium(AF009228) NaCl-inducible Ca2+-bindiuK protein A. Minium*(AF007271) contains a MADS domain A. tliiliarmsiKnal recognition particle 54 kd protein 2 (srp54)von Hippel-Lindau binding protein 1(AC002337) putative WRKY-type DNA binding protein

Ul suRNA A. thaliiui*(AF151390) Sex-lethal interactor DroBopliiJa iimltuioguuUr

(AF030705) Similar to phytoene desaturase(AL049862) putative protein A. thaliana(AL031852) conserved hypothetical protein Sciiixonacch&rotnyces

(AC002986) Contains homology to DNAJ heatshock proteingb|U32803 from Hanrioplnlua uirluetiiae. A. tnaliaru(AC003952) putative I IOI I -LTR retroelement reverse transcriptase

(AC005508) 12894 A. thaliana(AC000416) ESTs gb|T20589. Kb|T04048. gb|AA597906.gb|T04111. Kb|R84180. gb|RG5428. gb|T44439. gb|T7G570.gli|R9O004. gb|T45020. Kb|T42457. gb|T20921. gb|AA0427G2 andgb|AA720210 come from this gene. A. thaliana(AC002335) hypothetical protein A. tlialia/ia(AF062909) putative transcription factor A. thaliana

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 17: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

No. 1]

K7J8 (56963 bp)

II III

S. Sato et al.

Ilium III I nl i i I • • I I •

I I I (II4 5 6 7 8 " ? P -J*IBM 11 , 1,1 I • •

IBBBi1 2 3

• • I I I• ! • n

• - r ' iII flBlllllll I \

47

Grail exon

Protein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hitGrail exon

deduced genes

identifierK7J8.1K7J8.2K7J8.3K7J8.4K7J8.5K7J8.6K7J8.7K7J8.8K7JS.9K7J8.10K7J8.11K7J8.12K7J8.13K7J8.14K7J8.10K7J8.16

Directi._--+

++--+

+_+

J:I 5'1

19924290

10110176732011023247250092078529449389314102044253471505192155724

3 '114630868767

16635191832210524080258562827837562390024150844687503005372050959

No. ofExou

No. olEST

Lens,l,rnr~ Overlap Identity

(AC002292) Hypothetical protein A. t h a u W(AP000492) hypothetical protein Oryia. aativa(Z84377) xylosidase A«per#i/iu. fliger(AL049558) hypothetical protein Scl,i^uaccltnro,,,yca(AL049558) hypothetical protein SchiwjaaccfiJirorriycet(AF010448) No definition line found CuMorlwIxlitn <

(AC000267J putative polyprotein A. t^^iana(AC002535) putative WD-40 repeat protein A. thatRNA-Hi . (GTG)

(AL023094) bZIP transcription factor ATB2 A. tin[AJ243483) ATP citrate lyase Cyanophora pariuloxi[pneudo] reverse transcriptase - A. tlialissiu retrotrm[partial] (AC012563) putative protein kinase A. bh;

3022987741182184222801453081570

090300

72181145608002348

ei|2402745l<i|5922608|dl>j|BAA84009.1|gi|2181180|etiib|CAB06417|(!ii4581502|einli|CAB40161.1|«i |4581502|emb|CAB40161.1|*i|2315451

«i|4689454|glijAAD27902.1|gi|0598380|gb|AAC02845.2|X153C1

(•it3096928|emh|CAA18838.1|«i |5304837|emb|CAB40077.1|niJ2129709|pir||S65612Ki|6553900|Kb| AAF10572.1

201104066

94164131

72792

72

128603487347

38.135.437.928.530.3

90.450.1

100.0

38.003.980.168.4

ispoayn T a l l - 1

K9E15 (62052 bp)

K18C1

•1 I

2 3

nil i a iI M

! ' 6 " .

i i

B I IB7 9 B

Ml I • I! • • i n i l i l l M a n i i )•••

Grail exonProtein db hit

ESTdbntt

Gene

Gene

ESTdbhit

Protein db hitGrail exon

deduced

identifier

genes

D rectlonPonitioii

3 'NoEx

of No. ofEST

Leiifftl InforiSeque

IHt

ice

OH O

IDj the most fimihir seq

O vueucerl*ip Identity b efjn tlon

K9E1K9E1

K9E1K9E1

K9E1K9E1K9E1<9E1K9E1K9E1

K9F,K9E

t 23

.4

•5

.07

.8

.910

.11

5.1?5 13

3375773512815

1593720160

283203150035094408344381545102

74801176513951

1807822010

309433355038151415934451148002

12610

1217 gi|5459305|eniLi|CAB50708.1|1187 Ki|2901373|euib|CAA18120|

101 8i |3033379|gb[AAC12823.1|

478

391546

229198

gi|3080375|emb|CAA18632.1[gi|2191180

Ki[3080371|euib|CAA18028.1Ki|6552736|Kb|AAF16535.1|Kii33866O0JsbJAAC28536.11

gi|4510373|gb| AAD21401.ilH7496\d\i\B\M3l

5043801908

5230761985

463 Ki|3040815|26

mb|CAA10713.1

1160 52.8 (AL022141) putative diyease resistance proteii129 58.5 (AC004238) putative WRKY-type DNA

A. tlialuuut508 69.9 [pseudo] (AL022580) putat ive protein A. t U u u u252 30.2 [pseudo] (AF007271) contains simiUrity to tropomyosin

ne>iu A. tiuiliana387 76.5 IAL022580) putative pectinacetyle.terase protein A. tfudi.

60 40.9 (AC013482) T26F17.19 A. t h j u u u578 49.4 (AC004605) putative beta-amylase A. tlinliuiiH

133 23.1 (AC007017) putative harpiii-induced protein A.\V, V>& WJ\«S-| iuita W Sintarour)i» tm.i

SWISS- rROT Acce»»ion Number T4S978 S

A. tliniiubinding

thalUuaSCD6

protein

and ki-

p88.9 (AL021087) cytochr P450 A. t/iaiiana

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 18: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 7,

K9H21 (15319 bp)

I I ,"i

! • • I1 2 3 51 II I

IHIII i H I

Grail exonProtein db hit

ESTdbhH

Gene

Gene

ESTdbhit

Protein db hitGrail exon

deduced ge

identifier

nes

Lt:irectioiTo

5' 3 ' Exon ESTLengtl

Sequenice IDti similar ne<

O v erlap Identity Dehi litionK9H21.1

K9H21.2

K9H21.3K9H21.4K9H21.0K9H21.CK9H21.7K9H21.8

415058077192

110131193014657

575060448097

114571370515248

0

201(100

009

477172302104012140

gi| 1033072

Ki|6010737|Kb|AAF01003.1

ni|0023082]enib|CAB02340.11

Ki|6382035|Kb|AAF078)7.1|Ki|1003724

329

370

281

547133

23.9

38.0

58.9

01.057.5

dipho*phate kil(NDP kiniue II) (NDPK II)(U52004) Herpe.viru. «aimiri ORF73 liomoluij K«po«iV

(AC009325) hypotheticsl prote

(AL133315) hypothetical protei

(AC011020) putative protei[partial] (U50846) 4-coun

A. tlialiu

A. thaJitt

K9P8 (70670 bp)

ii miII

nn i

11

H':II

4 6 7 9 «

•mi i i B i a i i iiiiniw i i m mi

Grail exonProtein db hit

ESTdbhft

Gene

Gene

ESTdbhft

Protein db hit

Grail exon

deduced gene

intity Definition

K9P8.2K9P8.3K9P8.4K9P8.5K9P8.CK9P8.7K9P8.8K9P8.9

K9P8.10K9P8.UK9P8.12K9P8.13

1220018452271983343030000382234542449834

54130091730438307827

1773222103319793025237543420904938953012

07047028070049170040

107

1082

147

14

12133

12

0

02030040

0130

001

720779928718288

1080084037

824030019002

gi |4220458

Ki|441C407|Kb|AAD20309|Ki|100003CleiKi|4539009|eiKi|5051770|eiKi|5541007|eiKil3859083|eiKi|5051775|e,

inb|CAA70310|inb|CAB39030.1mb|CAB40063.1mb|CAB01173.1ii,b|CAA22020|.iib|CAB450C8.1|

gi|0091014|>!u|AAD39002.1

Ki|80783|uir||JL0032l!i|3237304Ki|0051781|e,Kil4249382lK

mb|CAB45074.1|blAAD14479

582

028778839710141074408525

270215008498

07.9

24.088.240.088.043.729.901.053.0

20.404.084.069.7

jim BAC «l>| AC002294. Ar*bi<lopviM

o tsapiet

(AC00621G) Similar toprotein homolug from Athuliuna(AF123318) mitotic checkpoint protein Ho(Y09095) cliloride chtmnel A. thulium*(AL049481) putative protein A, thaiimia(AL078037) hsp 70-like protein A. thulimm(AL090859) putative proton A. tlmlituiH(AL033503) conserved hypothetical protein Ouulid* Maun[Ac] (AL078037) putative protein A. tltnlitut*(AC007454) Contains « rF|00501 alplm/beta i.ydroWe fold do-main. A. thalituttihypothetical 31.7K protein (aphE region) - Streptomycev grixeus(U915C1) pyridoxine 5"-phoephate oxidate fUttut* iiorvvgk-ua(AL078037) transport inhibitor reKpon^-like protein A. tludiiui*[partial] (AC005966) Strong similarity to gi|3337350 F13P17.3 pu-tative permeate from A. tlmliatia BAC gb|AC004481. Aru.bi<li>p*i*

l

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 19: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

No. 1]

MAB16 (70475 bp)

I I I III III I Till: I

S. Sato et al.

i l l l l 1I.. I

49

MEE132 3I •_

'« \

IIIIII I

11 Oi III H

III I I

Grail exonPrcteindb hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hitGrail exon

deduced gei

identifier

l e s

b i r ettionI'o dition

b 3 'N o .

Exoiof No. t

ESTii Leii^tl

Seqiiriimti (III Oi

IDi the mot•t uiuiiltt

0;quenceverl»p I d e ntity Definition

MAB1C.1

MAB1G.2MAB1G.3MAB1G.4MAB1G.5MAB16.CMAB1G.7MAB16.8MAB16.9MAB16.10MABie.l lMAB16.12MAB1C.13MAB1G.14

955514190179902050923125295413442837347460495839GG12180433068244

10400150851878G22742243353302734943383915055460201631066752168078

0000000001200

1531412 3 04 7 7

2 1 3

11401 4 0

3187294 2 04 5 0512145

gi|5732430|«b| AAD49098.1|

si|3892701|emb|CAA22150.1 73 33.8Ki|G587850|gb| AAF18539.1 115 73.3«i|4640194|gb|AAD20867.1| 211 40.1Bi|4512G70|Kb|AAD21724.1| 432 31.48i|1001C8G|jbj|BAA10421| 01 45.2gi|99721|pir||S05465 1104 62.4Ki[4512670|gb|AAD21724.1| 138 42.4Kii5915851|np|Q42569| 277 29.5gi[4836882|8b|AAD30585.1| 722 76.8Ki|4512C51|8b|AAD2170C.l| 408 49.9gi|lG530G5|dbj|BAA18577| 277 68.7Sil0598853isb|AAF18707.1| 440 90.78i|6561951|emb|CABG2455.1| 87 47.7

MuDR.(AF177030) contain, . imil . r i t , to ul»i»« tr.u.po.oi(GB:M7G978) A. tlmli&itx(AL033545) hypothetiojil protein A. thulium*(AC006551) Hypotbetititl protein A. thuliaiia(AC007230) T23K8.1 A. tiialiuia(AC006931) putiitive tytochrome P450 A. thulium*(D64002) hypotbeticjil protein Syitechtxyutia sp.retrovirua-related polyprotein - A. thuliunu retrotriuiypo.on Tnl-3(AC006931) putative tytothroine P450 A. thuliutiuCYTOCHROME P450 90A1(AC007260) ldlpttj.eq No definition line found A. 6li«li«n«(AC007048) putative tyrosine jiminotrHns.ferape A. thuliunu(D90915) peptide chnin release factor Synechocj-Btie ap.(AC010556) putative ferine carboxypeptidiwe A. tha/iana(AL132964) hypothetical protein A. thaliana

MBM17 (52717 bp)

nun i in i M II

• • ' I

IHIWII III II I I

6 8 9 1)

1I I I

i mini

1112I I

III I

Grail exonProtein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hitGrail exon

idei

uced

titter

genes

D i r ctiuTon! ti on

i 0N o . A No. i

EST\ Le iKth Infon

Sequemtioice I

1 O

bi the mo t niiiiilwr yequ

Overlaentep MKI tity b e h i ition

MBM17.1MBM17.2MBM17.3MBM17.4MBM17.5

MBM17.GMBM17.7MBM17.8

MBM17.9MBM17.10MBM17.11MBM17.12MBM17.13

6569129

1437718280

257203311536125

3951042787449304666051088

7350125121734924528

327363503937922

4185644598403024815752717

2 42

102 0

30107

88034

66920

1102705

1053

Ki|2501242|s'p|Q13472Bi|2924777|(!b|AAC04906.1Bi|3540207Bi|4185142|gb|AAD08945.1

1081 gi|3913525|»p|O48901|387 Ki|601673C|gb|AAF01562.1353 gi|3913518|«p|Q42546|

353 «i|2765607|emb|CAB05889347 gi|2705607|emb|CAB05889246390 Ki|3236254|gb|AAC23042.1402 8i[33373Cl|t!b|AAC27406.1

8351060681608

1044333332

337346

395401

46.259.650.034.6

81.360.288.0

64.295.4

75.044.8

[Partial]DNA topoi»o.nera»e IIIIAC002334) putative receptor-like protein kina»e A. thIAC004260) Putative protein kina»e A. tlwliaiia(AC005724) putative SNF2/RAD54 family DNA rep,combination protein A. thuYmuuDNA polymeraae delta catalytic chain(AC009325) unknown protein A. thaJiana3'(2-).5'-bi.pho«phate uucleotidara (3'(2').5-bi.pho»phu3'(2>pho»phohydrola.e) |DPNPa«e)(Z83312) 3'|2').5'-bi»pho»phate uucleotida.e A. thuliu,,,(Z83312) 3'|2').5'-bispbo.pl1ate nucleotidate A. thalia/,.

(AC004084) unknown pro[partial] (AC004481) unk

A. t

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 20: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 7,

MCK7 (87090 bp)

I I inn • mil in i mill 11minimiw ni I la II II

n i

illI I

• I I9 n 13

I U • I I I •(I t IIIII II

Grail exon

Protein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protem db hitGrail exon

deduced genes

identifierMCK7.1MCK7.2MCK7.3MCK7.4MCK7.5MCK7.0MCK7.7MCK7.8MCK7.9MCK7.10MCK7.11MCK7.12MCK7.13MCK7.14

MCK7.15MCK7.1CMCK7.17MCK7.18MCK7.19MCK7.20MCK7.21MCK7.22MCK7.23MCK7.24MCK7.25MCK7.26MCK7.27MCK7.28

Dir ection

_

4.

_

_4--4-

+

_4__

4._

___

51

33477071

13392171391950222733260852767729240314803268G3420635290

4046G42283459494825850850531415573558033632106648670223728867848882791

3"31185701

125901622018282224882509727170289893111932299334973505740160

4179044241479904913052766554155730660806640976833971059743728009086992

Ext i n

171

13867734G434

18

60222

123718233

10

EST00202060102431

015

100

04000000

512780

1307750199552254283324424100211209728

337408654203589442466571296406404316325931

Sequence IUgi|5391442|gb|AAC27293.2|gi|4115383|gb|AAD03384.1gi|4559347|gb|AAD23008.1gi|4400192|emb|CAB30515.1gi|4097561gi|4204205gi|1001650|dbj|BAA10381gi|135532|sp|r23253|gil4455704|einb|CAB36617.1|gi|3122387|sp|O22407|gi|4003719|ref|Nrj»2003.11gi|G130546|sp|r72777gi|1653230|dbj|BAA18145|gi|5734021|dbj|BAA83352.1

gi|1709798|sP|r54778|gi|2700839|gb|AAB95307.11gi|2245024|emb|CAB10444.1gi|19463C9|gb| AAB03087.1|gi|462579|sp|r2152S|

Overlap Identity Definition[partial] (AF053941) non pliototropk- hypocotyl 1-Hke A. thali*(AC0059C7) putative receptor-like protein kiutu>e A. th*U*uu(AC00C&8S) hypothetical protein A. tlutlltuia(AL035440) putative protein A. thttliana(U64918) ATGPl A thalum*(AC005223) 45043 A. t/utWwt(D64002) hypothetic! protein Syttt^Uocyutla up.tfiah'dtife (neumuiinidnxe) |NA) (major *urfnce nutigen)(AL03M78) hypothetical protein SC2G&.30 Streptvmyct* cuelicuWD-40 repent protein MSI1fragile hltftidiiie triad geneYCF54-like protein(D90912) hypothetical protein Syued'ocyutw op.(AP000391) EST* AU067992(C11433)rAU077424[C11433) confpond to a region of the predicted gene.

26» protease regulatory HubuuH Ob houiolog(AC003100) putative receptor-like protein kiiuute A. thaliniia(Z97341) cyanohydrin lyase like protein A. thalUua(U93215) unknown protein A. thuJiaiutinalate dehydrogenafe. chloroplaet precursor (NADP-MDH)

gi|6580145|emb|CAB63149.:gi|1940370|gb|AABC 3094.1gi|0225108|>p|Q9ZG89|gi|3250035|euib|CAA74046|gi|5381253|dbj|BAA82306.1gi|5381253Jdbj|BAA82306.1

51170048369719827020113214942310197

141491

400594244544394

53464

129437299313

100.000.601.047.080.447.041.137.030.797.047.448.045.174.4

92.848.448.630.785.0

52.907.740.958.064.366.2

(AL1329G8) MAr kinas.(U93215) unknown proteinGTP-bindinK protein CGP/(Y14274) putative *erine/tli(AB027752) peroxidase Nk,(AB027702) peroxidase Nio

A. thuiianui> A. thaJia.

* t*}jiu;u[n

e Sorglmm biculor

MFB16 (66087 bp)

I I I INI I I I •

46

• I

II II I I I I I! Ill

7 8 t> n• • I

1 2 3I! !

HIT

i mini it I I

i IIOI3M

Grail exonProtein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hitGran exon

identifierMFBlo.l—MFB10.2MFB1C.3MFB1C.4MFB10.5MFB10.CMFB10.7MFB16.8MFB16.9MFB16.10MFB16.11

MFB16.12MFB10.13MFB16.14MFB1C.10MFB1C.1GMFB10.17MFB10.18MFB1C.19

n EST Sequence IU OverlapI 0 229^ gi |6598507[gb|AAF18020.1 12fJ 3 O ~3 2 540 gi |6091732|gb |AAF03444.1 | 528 49.0

14 0 548 gi |4581150|gb|AAD24034.1 | 440 59.08 0 372 gi |322598|pir | |S28004 345 70.21 1 67 gij0523097|einb|CAB62355.1| 61 43.53 1 359 gi |5931083|ei i ib |CAB50595.1| 179 43.3

10 2 320 gi |4455237|enib |CAB36736.1 | 316 81.16 0 303 gi |5M1703|emb|CAB01208.1j 295 50.36 5 349 gi[5541703!einb|CAB51208.1| 209 48.11 0 1851 0 135 gi |5732004|gb|AAD48903.1[ 114 84.3

3 0 107 gi |4400239lemb|CAB3C738.1| 100 09.21 0 139 gi|4097547 39 70.03 1 162 gi|4097547 1213 7 364 gi|2645971 3361 2 183 gil3090944lemblCAA18854.il 580 0 342 gi |0541703|euib |CAB51208.1 | 311

19 1 838 gi |4455240|einb|CAB36739.1 | 5246 fj 289 gi |1619002|emb|CAAC9976| 220

Identity Definition_ ^

(AC006053) puta t ive DnaJ protein A. thaiiana(AC010797) unknown protein A. Ihaliana(AC006919) hypothetical protein A. thuliaimStl2p protein - A. thaliana(AL133315) putative protein A. thaliana(AJ011C43) squamosa promoter binding protein-like 0 A. thaliu(AL035523) ubiquitin activating enzyme-like protein A. tliaiia/i.(AL09G800) putative protein A. thaliana(AL0968G0) putative protein A. thaliana

[pseudo] (AF1472G3) contains similarity to tmnsuosasA. thaliaua(AL035523) abscisic acid-induced-like protein A. thaliana(U64906) ATFP3 A. thaliana(U64906) ATFP3 A. thai/ana(AF034255) reversibly glycosylated polypeptide-3 A. thaliana(AL023094) putative protein A. thaliana(AL09G860) putative protein A. thaliana(AL035523) putative protein A. thaliana(Y0872C) MtN3 MeJicafto truncatula

71719368

1435910722200472298925477274093119135993

3791740133412904521348927509405441060011

55039070

12411100781G922212802488726782291423174536398

3841040549419944061449475528435902461974

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 21: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

No. 1

MG03 (43570 bp)

S. Sato et al. •SI

mini 111 i i:t an l

MWIsiin

Mill

I10

I

• •111 Bill 1 !

Grail exon

PICMUMI db hit

ESTdbhit

Gene

Gene

ESTdbhit

ProtMi db hitGrail exon

deduced gunw

identifieiMGO3.1MGO3.2MGO3-3MGO3.4MGO3.5MGO3.C

MCO3.7

MGO3.8MGO3.9

MGO3.10MGO3.11

2921383499374548377

1227273212825177451958522320

2730031090

210212408144001880C2171224051

25227 20867

290443207G

37290 4020C4101C 42254

272

Ki|5902002|ref|NPJ)0898G.lKi|0308787|ifb|AAF07 308.1Ki|0598300|Kb|AAF18098.1Ki|4887754|gb|AAD32290.1Ki|0498404|dbj|BAA87853.1]

398 gi|3367520

495 gi|3510253|«b|AAC33497.1327 8i|5203312|Bb|AAD41414.1|

485 gi|4510402|gb|AAD21489.1!320 Ki|5541C82|emb|CAB51188.1[

1350 49.5 polymerize (RNA) III (DNA directed) (155kD)448 43.7 IAC010852) hypothetical protein A. thaliana224 33.3 (AC002354) hypothetical protein A. thaliana519 50.4 (AC00C533) ankyrin-like protein A. thaliana293 3G.7 (AP000816) EST AU030604(E51294) correspond* to a region of

the predicted ( o n .301 39.4 (AC004392) Similar to protein kinase APKlA. tyrosiue-eerine-

threonine kinase gb|D12522 from A. thaliana. A. thaiiana413 30.0 (AC005310) hypothetical protein A. thaliana200 54.3 (AC007727) Contains 3 PF|0080C Pumilio-family RNA binding

domains (PUF). A. (Italian*444 41.1 (AC000087) putative AP2 domain transcription factor A. thaliana132 45.1 (AL09G859) putative protein A. thaliana

MHM17 (78423 bp)

I I I

a

HUHI

1 III II I

1 in• Hi I I

13 14 6 K 17 19 20

• • • • I MUL3

ntii t

I I I IBI • I

• 10 ti oI I!

I I 1 III I I III I I

Grail exon

Protein db hit

ESTdbhtt

Gene

Gene

ESTdbhft

Protein db hit

Grail exon

No. ofEST

Length Info]Overlap Identity Definitic

MHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMlMHMl

.1 -2 +.3 +.4.5 +.0 4-.7.8.9

012 -3 +4 +5 +0 +7 +89 +

.20 +

422474079471

1242310350170892317729418392424104845544483445207855979599220180905214085077219175435

71418470

102331503010008197072004032710393154218440782490745407159045012230428008287090877377170204

104273

115514

48

132

117444

00002800020120300420

093224227075135374400523

74299224187458441370524795197423140

Ki[1723495Ki 12244927Ki 12244927Ki 15089465Ki 13090931Ki 14538940

sp|Q10414|emb|CAB10349.emb|CAB10349.dbj|BAA83010.1emb|CAA18841.emb|CAB39C7C.

Ki|3894198|gb| AAC78547.1[Kil5123568X54513Ki 14538942

Ki|1100355Ki|3070398Ki|4538939

emb|CAB45334.

emb|CAB39C78.

KH AAC14530.ilemb|CAB39C70.

Ki|3212870|Kb| AAC23421.1|Ki|1399181Ki|6593498|Kb|AAD10106.2Ki|59O2371|Kb|AAD00473.1Ki|3914239Ki|4538935

sp|O04719emb|CAB39071.

535| 44I 75

731 1 "I 373

71| 402

G7290

107451439345523545109422123

25.942.228.939.258.007.938.953.188.050.8

31.050.980.975.190.259.233.097.270.0

hypothetical 03.2 kd protein(Z97339) hypothetical prote'(Z97339) hypothetical prote'(AB028987) KIAA1004 prot(AL023094) putative ribosot(AL049483) nucleosome asse(AC000002) hypothetical prc(AL079344) cytukinm oxidastRNA-Val(CAC)(AL049483) uncharacteriued

C1F3.09 in chromosome In A. thalianan A. thalianain Homo sapiensml protein SIC A. thaliananbly protein I-like protein A. thalianatein A. thalianae-like protein A. thaliana

protein A. thaliana

(U33058) UNC-89 CaenorliaWitis el<-gan«(AC004484) unknown protei[pseudo] (AL049483) Col-0 c(AC004005) putative N-myr(U50738) lycopene epsilon c[pseudo] (AC005917) putativ(AC009322) Hypothetical prPROTEIN PHOSPHATASE(AL049483) putative proteit

i A. thaitanaasein kinase I-like protein A. thaiianastoyltransferase A. thaitanaclase A. thalianae protein kinase A. thalianautein A. thaliana2C ABI2 (PP2C)A. thaliana

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 22: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

m Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 7,

MIF21 (59372 bp)

I Mil I

WDMi 1 •

I 111 1 III

II I 111 1!a • « n! • ! •

", "a "< "s I

11 HIII

ii m

I i 111 i mi

C M516T7

II i

Grail exon

Protein db hit

ESTdbWt

Gene

Gene

ESTdbhH

Protein db hitGrail exon

deduced genes

MIF21.1

MIF21.2

MIF21.3MIF21.4MIF21.5MIF21.CMIF21.7MIF21.8MIF21.9MIF21.10MIF21.:

No. ofExon

No. ofEST ferlap Identity DeHnition

309 2T2 (U70559) DNA repair/trt ion protein Min.l9p Sactha-

33.2 (AC009917) Contains a bZIP transcription factor PF|00170 do-main. ESTs nb|R30400: gb|AA000904. gb|AI994521 come fromthis gene. A. thah'ana

79.2 (AJ003130) polygalacturonase A. (Indiana00.7 (AC007211) putative SCARECROW gene regulator A. (Indiana76.3 (AC00917C) unknown protein A. (Indiana31.1 (AL078037) putative protein A. (Indiana05.3 (AC00917C) unknown protein A. (Indiana80.3 (AC009803) unknown protein A. thahana

MIF21.MIF21.MIF21.MIF21.MIF21.MIF21

109

7900

100251411519037243473290130094308704007441545

6389

10002

120271000821430248173380335781388824038243058

441885091053337548205030258082

471585270554728553925732759259

025 gi|0034702|gb|AAF19742.1

011

091124

4905741 5 73 0 11 0 22 1 01 0 3

3 7 9

gi|2982083|emb|CAA05892| 384gi|4585920|gb|AAD20580.1| 408gi|0400900|8b|AAF13095.1| 573gi|5051703|emb|CAB45050.1| 121si|C4C0955|gb|AAF13090.1| 292gi|0041838|gb|AAF02147.1| 101

480339373100322130

ui|4587010|gb|AAD20838.1|

gi|1542941|emb|CAA55006|gi|4929099|Bb|AAD34110.1|gi|040C948|gb|AAF13083.1|

gi|4580245|emb|CAB40980.1| 305

305 04.2 (AC00C951) putati

402 89.3 (X7811C) Acetoacetyl-coenuyine138 30.7 (AF101873) CGI-110 protein Homo «*372 07.8 (AC00917C) unknown protein A. tha/i;

indole-3-glyterul phosphate

A thiola.e lUpljaiiun n

syntha

58.5 (AL049C40) putative protein A.[ ]

MJB24 (58589 bp)

III 13 1 litII

MUL3

1 mi 111: 1 1

18 12 O 14

2 6 7 8 10 II

if IIin i a n mi i in i i

«

mi

Grail exonP r o t e i n * hit

EST db hit

Gene

Gene

EST db hit

Protein db hit

Gran exon

deduced genes

MJB24.1MJB24.2MJB24.3

MJB24.4MJB24.5MJB24.6MJB24.7MJB24.8MJB24.9MJB24.10MJB24.11MJB24.12MJB24.13MJB24.14MJB24.10MJB24.1C

+-

+

---

--

+++-

Position

17345129

11949

14010174871900022735250002857131090327403500830728424305200055249

40307743

13710

15020193382268524311271433002031873340493500541084440995312908089

N o . of

1453

001

289282

2 2

82

13

No. ofEST

001

0001

105000048

Length

5 9 17 3 74 9 1

100379

10125024262891 2 0

288102902312250777

Information on the most similarSequence ll>

«i|4914419|gi|44G8809

xi|4539303|Ki|CC30404|Ki|5915831Ki |4539305|Ki|6319895|gi]022C017|

emb|CAB39C59.1emb|CAB43070.1e.nb|CAB38210

emb|CAB39C0C.lgb|AAF19502.1•PJO64718Iemb|CAB39C08.1ref|NP_009970.1sp|P06724|

gi|4511988|gh|AAD21548.1

gi|3093294|gi |20010001

Ki|l 14339|.

emb|CAA73320|sp|Q40784|

p|P20431

Overlap5900104 8 4

3707005003452801 1 9

258

901302

770

Identity00.354.070.3

74.828.170.377.540.002.551.7

08.871.9

97.0

Uetinition(AL049483) predicIAL050352) putatiIAL035C01) cytoA. thaliana

(AL049480) putati

ted protein destination factor A. thaJianave protein A. tha/ia/iachrome P400 monooxygenase (CYP91A2)

ve protein A. tha/ia/ia(AC007190) F23N19.4 A. (IndianaCYTOCHROME P450 71B9(AL049480) putatiProtein carboxyl n60. acidic ribosom.(AF08889C) ubiqui

(Y12782) putativepossible apospory-.

|partial| plasma 111,

ve protein A. tha/ianalethylase»1 protein P3 (Pl/P2-like)none methyltransferase Zymomoua. mobilu

villin A. thaiianaassociated protein C

emhrane ATPase 3 (proton pump)

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 23: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

No. 1]

MJM18 (16203 bp)

II II

M1O24 •

S. Sato et al. 53

use

• < l l 7 ^

I ! 1 II III II III U

Grail exonP r o t e i n * hit

ESTdbhft

Gene

Gene

ESTdbhit

Protein db hitGrail exon

identifierMJM18.1MJM18.2MJM18.3MJM18.4MJM18.5MJM186MJM18.7

Direction 5'66

3329+ 7080

8857113751239814890

3'181750148403

10020121451327616203

No.E x i in

574321

1

No. of

EST0001000

LenKth

157207250322242293436

Information on the most similarSequence IDKi|29795CC|Kb|AAC00175.1Ki|2979500||!b|AACOC175.1

Ki[2956707|emb|CAA70370|Ki[5042160|emb|CAB44685.11gi|2979559|Kb|AAC00108.1Ki|3292823|emb|CAA19813.1|

Overlap156202

281238264393

Identity52.248.3

65.254.450.945.4

Definition[partial] (AC003680) MADS-box protein(AC003680) MADS-box protein (AGL20)

(Y1C778) peroxida»e Spiriacia ofcraosa

(AGL20) A. [IndianaA. tlialiana

(AL078020) cytoclirome P450-like protein A. thalia/ia(AC003680) putative PCF2-like DNA bin[partial] (AL031018) putative protein A.

idiuK protein A. tlialianatha/iar;»

M J P 2 3 ( 3 1 8 2 7 b p )

1 i l I l l l II III III

3 4 6K19P1.' I • •

9 nnK1BG13

5 7 8il I

• I I I i HIIIIin

Grail exonProtein db hit

ESTdbhH

Gene

Gene

ESTdbhit

Protein db hitGrail exon

deduced genes

Identity DefinitionMJP23.1MJP23.2

Mjr23.3MJP23.4

MJP23.5MJP23.0MJP23.7MJP23.8MJP23.9MJP23.10MJP23.11

2803

C2279910

1129011818141841993C24204287982974C

4200

700010987

11005139001844822398202022902331174

393 Ki|5302776|emb|CAB400:>4.1|468 gi|4409008|emb|CAB38269|

227331

215461795401295192232

Ki|1755066gi|729774|»p|P41152|

X580088i|4098647si|3914083|»p|P73025|gi|4469009|emb|CAB38270|gi| 1800147

yi[3258570

314 65.7 [partial]^(Z97337) hypothetical protein A452 54.7 IAL035G02) UDP rhamnose ai

rhamno.yltrausferase-like protein A. thali118 44.5 (U63012) lectin precursor Sophora japonic250 42.4 heat shock factor protein HSF30 (heat si

30) (HSTF 30) (heat stress transcription216 100.0 U3 suRNA A. lhaliana448 99.8 (U80008) homogentisate l,2-dioxy|(enase743 31.9 muU2 protein343 79.1 (AL035602) putative protein A. tliaJia/ia294 70.9 (U83055) membrane associated protein A

227 59.0 (U89959) Unknown protein A. tiiahi.na

thucyanidin-3-gluco.ide

factor)

A. tlmJia

M K N 2 2 (27229 bp)

• IB II II II II IIII nil1 2 3 4 5 6 8

MCC7 m • I • I I • r,'IK1P

:S- I !

Grail exonProtein db hit

ESTdbhtt

Gene

Gene

ESTdbhit

Protein db hitGrail exon

deduced genes

identifierMKN22.1

MKN22.2MKN22.3MKN22.4MKN22.0MKN22.0MKN22.7MKN22.8

Direttio+

+++

_

Poyitioiin 0'

1

710010GOO1481018077200102128420108

3"2397

9273117301087018800207542193020800

No. ofExon

12

0271423

No. ofEST

4

0117020

Length

481

004144420

GO

132172403

Information on the most dinSequence IDgi|2000277|tfp|P08927|

Ki|6087804|Kb|AAF180&0.1|

Ki|3128108|Kl>| AAC1CO72.1|Ki|4039428|emb|CAB38 961.1Ki|2032100|emb|CAB114C9|

Ki|3193321

lilar sequenceOverlap

479

290

340| 09

80

200

Identity89.0

52.4

GG.350.054.0

37.8

Definition

(60 kd clmperc(AC012680) p.

(AC004521) ui(AL049171) P>

(Z98700) HrKhi

(AF009299) N

>nin bet* »ubujtntive RNA-b

iknown proteiiitHtive proteinlyl-tRNA yyntl

u definition lin

nit)iutlii

i A.A. (

lettu

,e fo

(CPN-00 beta)UK protein A. thuiia

thalintia

'•.haliixim

.•:. A. thaJJana

uud A. thaJittuu

nit precursor

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 24: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 7,

MMI9 (81736 bp)

i i i i iii ilia i i 11 IIIII i II B I I I I I I I

MTG106 8 9 10

114 «i7 «

II I

19BI

•7 '«III I i II!

I HUM IM i l 1MB I I I llll I I I I I I II I DIE I I

GralexonProtein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Proton db hitGrail exon

identifiel Identity DefinitiuMMI9.1MMI9.2MMI9.3MMI9.4MMI9.5MMI9.6

MMI9.7MMI9.8MMI9.9MMI9.10MMI9.11MMI9.12MMI9.13MMI9.14MMI9.15MM19.1CMMI9.17MMI9.18MMI9.19MMI9.20MMI9.21MMI9.22MMI9.23MMI9.24

1995565079749489

17883

2399427206307813307738015410184284144855503385304055052

621496335472134738307000777758

532971248927

1145923577

2001929431327953693339432424194300449801512365305155069

627576029973305750207705681502

226

U10

1000

003040

1001104000400

327071231223549938

327511404420236229117

1168245204206202203982348440214811

Ki]2979555|Kb| AAC06104.ilKi| 1488521 |eu]b|CAA08194|Ki|3241945|Kl>|AAC23732.1!Ki |3335171 |Kb|AAC27073.ljKi|4512098|Kb| AAD21751.ilgi|5734642|dbj|BAA83373.1

Ki|3292811|emb|CAA19801.1ni|4522009|Kbj AAD21782.1|Ki|4454020|einb|CAA23073|Ki|3401815|Kb| AAC32909.1|Ki|2583118|Kb|AAB82027.1|Ki| 1000971 |dbj|BAA05009|Ki|135095G|s>p|P49200|Ki|4589590idbj|BAA70817.1l!i|6224938|Kb|AAFO6022.1Ki|5541705|enib|CAB51210.11Ki|5541705|emb|CAB51210.11Ki|5541705|einb|CAB51210.1|Ki|1871577|enib|CAA72315|Ki|0630464|Kb|AAF19552.1Ki|2901375|tnnb|CAA18122|gi|3702325|«b|AAC62882.1

lii|0522529leiub|CAB01972.1

324033165119394911

1964922182232201241104532441G6171201187835315105

810

44.088.252.401.742.309.6

30.530.747.930.831.732.899.141.989 835.334.961.939.420.151.328.3

70.8

(AC003G80) unknown protein A. thaliana(X99938) RNA helica*e A. tliaiiaiia(AC004025) unknown protein A. tluJiaiu(AF0G7858) embryo-specific protein 3 A. thahana(AC000S69) unknown protein A. tlwliaiia(AP000391) EST. C22007(S0014),C22650(S0014) corre.pond toregion of tlie predicted gene.(AL031018) hypothetic»l protein A. thaluui*(AC007009) unknown protein A. thaiiana(AL035396) putntive protein A. tluliiuui(AC004138) hypothetical protein A. tlialimm(AC002387) hypothetical protein A. tlialiaiw(D2C076) chloride channel Ory< tol«gu» cuniculu.40. ribosomal protein S20 (S22)(AB023190) KIAA0973 protein Homo napioia(AF19902C) putative lran»cription factor A. thuliumIAL090800) putative protein A. tlwliuui(AL090860) putative protein A. tliajiaiui(AL090860) putative protein A. tliajiaiia(Y11553) putative 21kD protein precur.or Maliuxfu mtiv.(AC007190) F23N19.4 A. tlialiaiia(AL022141) NAM like protein A. t);aji».,«(AC005397) hypothetical protein A. tliajimia

lP»rt

MNB8 (46872 bp)

I I

11 II12 3 5

MXC20 I • I

i i ii ii i uiiii in

ii II II

II I I « mini in i II II i

GraNexon

Protsndb hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hitGralexon

identifierMNB8.1MNB8.2MNB8.3MNB8.4MNB8.5MNB8.CMNB8.7MNB8.8MNB8.9MNB8.10MNB8.11MNB8.12MNB8.13MNB8.14

Overlap Identity32

7313101581370699138

1217523380288003045134018388364231844105

4841747449671238484

1135020018257132980731429369134101943G3340326

932

3

4032200003000

170179222272413

109572024525C438000217498

gi|1652779|dbj|BAA17698|gi|1652387|dbj|BAA17309(

|4972115|etnb|CAB43972.1| 271|S107033|gb|AAD39930.1 4124914414|emb|CAB43665.1 8971408192 4834972112|emb|CAB43969.1 216|6539250|gb|AAF15920.1 96

i|3342450 1475080792|gb|AAD39302.1 417

,iG22C013|.p|Q9ZEA4| 15412645229 306

40.7 (D909O8) hypothetical protein Synechocyatia ap.47.1 (D90905) hypothetical protein Syn«lio<ytftiu up.

55.9 (AL078579) putative protein A. thaliana86.7 (AF133708) PP2A regulatory .ubunit A. thaliana42.4 (AL050352) Ca2+-tran»porting ATPa«e-like protein A. thah'a22.1 (U59294) myoain heavy chain Placo/jecceii ma^ellanicuB42.4 (AL078579) hypothetical protein A. thaliana48.5 (AC0117G5) hypothetical protein A. thahana33.8 (AF071233) lipolytic enzyme Sul/bloou. «<ido<aWariu.30 1 (AC007576) Unknown protein A. thaliana23.9 50. ribo.omal protein L923.5 (U78597) kinesin light chain Pl«tonerna Ijoryaniiin

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 25: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

No. 1]

MJE7 (74298 bp)

• I II M i l l

mi2 3 4

I I I I I

! !)« 1112

S. Sato ct al.

I IS Illll I I

is

III

K 15N18

Grail exon

Protein db hit

ESTdbhit

Gene

illI I

It I Ii i IIIII i in II i II i i

rin

ESTdbhit

Protein db hitGraHexon

deduced genes

identifierMJE7.1MJE7.2MJE7.3MJE7.4MJE7.5MJE7.CMJE7.7

No. tExou

ngth InfoSequel Overlap Identity Definition

(AC000348) T7N9.20 A. tlujia4890

1160315730217842331425354

1945107791406218693222092453128007

6181337807847142406459

gi|2213600 593 40.6gi|2244847|emb|CAB10269.1| 354 27.0Ki|6041847|sb|AAF02156.1| 819 55.9Ki|G041847|j<b| AAF02150.1 787 55.3

gi|285741|dbj|BAA03413 398 30.3*i!3915984|tfp|P33642| 229 28.3

(Z97337) hydri(AC009803) unknown(AC009853) unknown

(D14550) EDGr precuhypothetical 39.5 kd(ORFZ)

i A. Italia;i A. Mialia.

otein houioloK A. t/iaJia

.or Daucu»,ddc-reducta: fiint 3'regiun (DADA")

MJE7.8MJE7.MJE7.MJE7.MJE7-MJE7.MJE7MJE7.MJE7.MJE7.MJE7.MJE7-

01

23

6789

2871731647

+ 37022+ 37229+ 40233

43354->- 48654+ 53499+ 56826

6399204518

+ 71047

310313478437093394334107443056492755429861063042490030073301

814

17

21

217

1

1

7

0401

3

0

00001

544654

72364166101151224671

86203200

Ki|4725941|emb|CAB41712.1Ki|281122GZ11880gi|3873500|embjCAA22127|

K'.j 1420887Ki|4733902|Kb|AAD28645.1Ki|2601311|Kb|AAB87091.1)Ki|2792304l!i|0041842|i!b|AAF02151.1Ki|3643271Ki|3915958|*p|Q08270|

531003

64-188

7233

201422

64241

53

61.898.392.227.0

39.776 561.919.150.938.000.0

(AL049730) putative(AF042609) fimbrintRNA-Lyx(TTT)(AL033534) neriue-ri

(U34334) nou-,pecifiIAC0072C11 unknowIAC002336) hypothe(AF040964) unknow(AC009853) unknow(AF090872) 33 kDa[pceudo] hypothetic^

pollen-.2 A. thah

h protei

lipid tr

ical prol protein, proteinecretoryHIT-lik

jecific protein A. thaJiunaaria

L StiiraoBatc/iarornyt.* pornb*-

*n»fer-lik« protein Plia«ro!u» vulgar-i.

ein A. U.aliariaIT1 Hvinti sapiensA. 6ha/ianaprotein Orywi tativu

e protein MJ0800

MNI5 (21011 bp)

K6M1

II I

•Ii

1 I2 3

4III

Grail exonProtein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hitGrail exon

dedu

identiMNI:MNI:MNI:MNI:

iced genes

ifier Dh. .1, 2..3. .4

ection-

-

I W ' ™ . '

7009433

1091419420

3'44579042

1358721008

E x i , » '

61

112

EST0123

48170

563452

S.Ki

K>

,ouence ID1769887]emb;CAA05051

4827060|ref|NPJ)05099.1|6500764|Kb|AAF10704.1

Overlap480

507439

Identity97.3

45.744.1

Definition(X95730) amiiiu H<

xyluloki inure (if. it[partial] (AC01010

,id penm

iSutatzHt?)•$) F3ML

:ase 0 A. t/iaJiaua

liomoloK8.12 A. tliaJiaija

MPI10 (29605 bp)

1

IMIK11'

I I

fl I

I I III

4 6

II

I I I I • III

Grail exon

Protein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hitGrail exon

deduced genes

identifier DirectMPI10.1MPI10.2MPI10.3MPI10.4MP110.5

Positionion 5'

13709

183602321924880

N o .

10875427

224832309529304

J !

33

171

13

No. ofEST

10

0

Length

302398724159

1087

InforinatioSequence 1

v;i |4538944*i| 4538943

Ki|4612705

i on the must simiD

emb|CAB39680.1|emb|CAB39679.1|

KbjAAD21758.1!

ar sequenceOverlap

397723

404

Identity

42.580.5

02.7

DetiniSptirtij.{AL04(AL04

i o n

1]9483) put9483) put

1] {AC00C

utive

669)

trdiiwcription fbetii-^HlHctyyii

putwtive protei

H*e A. thuiiiuiit

i kiiitwe A. thaliaim

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 26: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

MQL5 (88398 bp)

[Vol. 7,

III M l ii ill ill mi I I

i

9 V CO M 16 17 18 19 20 21

III •Illl I I

2526 27 26

• •IB

2 3 4 6 7

MM III

I III II I III I I I

• I2223

I I 1 I I I I II I I M

GraflexonProtein* hit

ESTdbltt

Gene

Gene

EST db hit

Protein db hitGraiexon

deduced gene

identifier BMQL5.1

6

llrectiou 5 ^ -0073

3'0317

Exon EST1 0 4 1 0

Sequence IDgi|4314371|Kb|AADl 0082.1

O v erlap262

Identity42.6

Definition(AC006340) .iui ilar to mam malian MHC 1111 reKion protei ll G9a

MQL5.2MQL5.3MQL5.4MQLO.OMQLO.OMQL0.7

MQL5.8

MQL0.9

G8539729

12777102291756720368

73021160714488170571935421090

22829 24170

28100 29313

032

70

0

1

150220280159357243

329

329

Ki|4895238|Kb|AAD32823.1| 183Ki|4539454|emb|CAB39934.1 107«i|2245111|emb|CAB10033.1 150«i|2240110|emb|CAB10032.1 300l<i|3434969|dbj|BAA32419| 242

8i|4330720|«b|AAD17398.1| 278

«i|3434970|dbj|BAA32422| 299

40.748.1

MQL5.10MQLO.l 1

MQL5.12

MQL0.13MQLG.14MQL0.10MQL0.1CMQL0.17MQL0.18MQL0.19MQL5.20MQL0.21MQL5.22MQL0.23MQL0.24MQL5.25MQL5.26MQL5.27MQL5.28MQL5.29

+-

-

++++-__-t-

+-t-

-

3246934390

37753

4002641499403804046749067020470001207849610900279004827098177534377504805578135084104

3380030918

40407

4085543898407734771401013032900700009473026570422000989728137688079801809258319188220

01

1

10

2144

76714

11331

1013

00

0

000002

22105030001

2 9 5

843

8 8 0

1 1 0623

90410240244314293310477283098300000123208804

gi|2980800gi|3C00040

gi|3000040

«i|1703219gi|4490297Ki|6119525gi|6522919gi|224G108gi|730688|*gi|2245107gi|2245107Ki|2245107

euib|CAA18182.1

»p|P54120|enib|CAB38788.1Kb|AAF04169.1emb|CAB0210C.lemb|CAB10G30.1p|r39097|euib|CAB10029.1emb|CAB10029.1emb|CAB10029.1

gi|4836917|Kb|AAD30619.1|Ki|11701C9 sp P4CC01gi|3152613|gb|AACl 7092.1(!i|10766C0Jpir||S51839

gi|2245101gi|2245100

enib|CAB10523.1emb|CAB10522.1|

203839

8 3 2

72601

88371220211233213230304198347304

2 1 4

510

49.242.5

41.9

49.304.038.201.371.499.066.766.409.120.979.934.253.4

70.748.5

(AC007609) putative VAMP-associated protein A. tha/uuu.(AL049000) contaim EST gb:AA72841C A. tliajiaria(Z97343) GTP.bindiug RAB1C like protein A. tlialiaua(Z97343) hypothetical protein A. tiialiana(AB008104) ethylene responsive element binding factor 2

(AC00C248) putative non-LTR retroelement reverse tran.criptaseA. tlialiaiia(AB008107) ethylene responsive element binding factor 0A. tluJimia(AL022197) putative protein A. tha/ian»(AF080119) similar to A. thaliana disease resistance protein RPS2|GB:U14108) Arubiilopttie thaJiana(AF080119) similar to A. thaliana disease resistance protein RPS2(GB:U14108) ArattidopBis thalitniaA1G1 protein(AL035C78) putative protein A. thahatia(AC011560) hypothetical protein A. tlmlUnu(AL132978) putative protein A. tlialiaua(Z97343) EREBP-4 like protein A. tlialiaiia40s ribosomal protein S19. uiitochondrial precursor(Z97343) thioesterase like protein A. thaiiana(Z97343) thioesterase like protein A. elialiaiia(Z97343) thioesterase like protein A. thulijtiM(AC007153) 80099 A. tlialiaiiahouieobox-leucine zipper protein HAT2 (HD-ZIP protein 2)(AC004482) hypothetical protein A. tlnJiunuD13FIMYBST1) protein - potato

(Z97343) hypothetical protein A. thaliu[partial] [Z97343) DWA-bimliiiK proteii

MSD23 (33479 bp)

• II I • I I II I

1 2 3 5 6MZA-5

4III I

I I I I I

VQD22

Grail exonProtein dbhH

EST db hit

Gene

Gene

EST dbhH

Protein db hitGrail exon

deduced

identifier

p e r i e t

b irectionI'osition

3'No. of

" Exuii EST Sequence ID O v t rl»p Idei itity i)eh iiutionMSD23.1MSD23.2

MSD23.3MSD23.4MSD23.0MSD23.0

40997031

12094143251784720192

02319580

13987157751890726907

0

00

290208293258

i|1465368|emb|CAsi|3875770|emb|CAA92093.1

8i|185054Ggi|2245131|emb|CAB10552.1(•i|4510429|gb|AAD21515.1|

248289223

33.5 (Z08297) SimilaritySAP02 (TIR Ace. No A47C55)

93.2 (U88045) .yntaxin related protein AtV»m3p A. thali«72.8 (Z97344) hypothetical protein A. Uudiaiia51.8 [p»eudo) (AC006929) putative non-LTR retroeleuient i

scriptaise A. tha/mtia

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 27: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

No. 1]

M Q M l ( 8 1 3 6 5 b p )

I ill III III I

2 5K13M13 I I

57

III I II! I III 1 I II N

i II

8 10• IB

11 B 14 B

1 3 4 67

11 III II mi mi

16 17 1920

i i i mm i i in i

Grail exonProtein db hit

ESTdbhtt

Gene

Gene

ESTdbhH

Protein db hitGrail exon

dedi

ideut

iced genei

ifier D'i rectionPosition

3'No. ofExon

No. ofEST

LeiiKth Infon.Seque

tiatin e e

on oilb

:i the iriust similar sequenceOverlap Ide i t y Dehn itlOIl

MQMl. lMQMl.2

MQMl.3MQMl.4MQMl.5MQM1.0MQMl.7MQMl.8MQMl.9MQMl.10MQMl.11

MQMl.12MQMl.13MQMl.14

MQMl.15MQMl.10MQMl.17MQMl.18MQMl.19

MQMl.20MQMl .21MQMl.22MQMl .23MQMl .24MQMl.25

5095

58981185215855172812084G28519312993381334071

4250445CSC47379

498825227750444C083C02979

05191C794271019731887594978088

0784

94041405110739204202091730370325033405341550

440714032148512

5192450702080940239804043

004440874072721700497818281361

00

003607003

031

09013

000000

8 4 0230

497499295489

72285337

651179

464212349

490083025175308

109208430206306286

tO|4100903|enib|CAA77232|gi|4176538|einb|CAA22872.1<

Ki|4309972|Kb|AAB81870|Ki|3328838|Kb| AAC68007.1|Ki|6033999|dbj|BAA09700.2|Ki|4990890|emb|CAB4431G.lD00935Ki|3747111gi|3330378|gb| AAC27179.il

gi|3874214|cmb|CAB00083.1

Kil4220491|Kb|AAD12714.1K1J2583121 |K1>| AAB82030.1gi J2384910

Kij2129548|pir||S71190Ki|4502897|ref|NP_001285.1Ki|4900074|Kb|AAD34008.1Ki|4200249|emb|CAA22897|gi!G030089|dbj|BAA88530.1

Ki|6225984|sp|Q9ZCRG|gi|5487873|gb|AAD04946.2

Ki|0017097|Kb|AAF01580.1Ki|C017097|nb|AAF01580.1Ki|0017097|Kb|AAF01580.1

1G0211

37219901

488W

284170

1144

50207203

489035109127307

90220

9490

123

33.129.2

33.233.537.183.490.3

100.043.8

42.5

43.995.230.3

93.144.024.540.687.7

51.530.6

51.040.233.1

[partial] (Y18C20) DtfPTPl protein A. thxliu(AL030260) dna-directed rim polymerase iii suromycea pombe(AC002983) hypothetical protein A. thaiiana(AE001314) PolyA Polymerase Clilnmydia. trmh(D03479) KIAA0145 protein Homo wpirno(AJ242C59) o«rinii palmitoyltrHiinfertwt! SohuiumtRNA-Gln(TTG)IAF095041) MTN3 hotnolog A. thalUim(AC003028) putative MYB fnmily trtu»criptiun

E1-E2 AT

fjtctor A. tluil'uuui

Puse YEL031W(Z83217) Similarity to{SW:YEDl_YEAST)(AC00C069) hypothetical protein A- tltalitui*EAC002387) unknown protein A. tlmlinim{AF022982) contain* »iiiiil»rity to a DNAJ-Hke domain CtMxwrlmfj-ditw f/fga.n»calcium-dependent protein kiiiase (EC 2.7.1.-) - A. th»li«nacleft lip *iiid palate associated traii^iiiembrane protein 1(AF149049) M protein precursor Strcptoco<<UH pyognit*(AL030297) hypothetical protein Homo mpientt(AP0009C9) ESTy D39011(R0009). AU032023(R3215) correspondto a region of the predicted gene.50s ribo*omal protein L24(AF110333) PrMC3 rinut. radiate

(AC009895) hypothetical protein A. thuUana(AC009895) hypothetical protein A. thaliaua[partial] (AC009895) hypothetical protein A. tfiaJiana

MRG21 (55151 bp)

K19D1

i m i i iin

3 4

•III II IH i I

I! I 1

111IIIin ii

MOBS

1 2 5 6 7 8 «I ! ! I!

J i

III i • • •iiimii i i n

Grail exon

Protein db hit

EST db hit

Gene

Gene

EST db hit

Protein db hit

Grail exon

dedi

ident

iced g

ifier

enes

D i r ectioPonition

3"No.Exo

of No. ofEST

Lengtli I nfcSeq

>rmatSon oilb

ii the mi>*t similar sequOver

encelap Jdent:i ty Defii lit ion

MRG21.1MRG21.2MRG21.3

MRG21.4MRG21.5MRG21.0

MRG2MRG2MRG2MRG2MRG2MRG2

29070923

129341300919374

226032011029282302200134704288

43879548

130061017722138

250002720234741370090328055041

91

21

00I

41690020

379950010450221

gi|0520231|dbj|BAA87957.1| 227 44.3gi|5882743|gb|AAD00290.1 000 67.3

gi|0903050|gb|AAD55615.1| 37 71.1gi|5882745|Kb|AAD55298.1| 618 62.0gi|1255871 ' 90 33.0

Ui|2C23300|Kb|AAB80452.1| 320 49.5gi|4078333|euib|CAB41144.1| 903 84.5gi|4C78332|euib|CAB41143.1| 013 73.8gi|207073|«p|r29512| 429 90.78i|207073|«p|r29512| 220 93.7

IAB028232) helin-loop-lielix protein liouiolog A. ij.aliaiIAC008203) EST» gb|H30134 «nd gb|H30132 come from

[p.eudo] (AC008010) F0D8.29 A. tlialiana(AC008203) F20A4.24 A. tttaliaiia(U53341) .l.ort region of weak nimiUrity to bovine me.ceptor pO3 (PIR:S28503) Citeitorliabditi* eJegau«

(AC002409) unkn(AL049C08) H+-t(AL049C08) putative peptide tr»itubulin bet«-2/beta-3 chain[partial] tubuliu beta-2/bet»-3 cli

protein A. «;»luuuiportinB ATPum-like protein A.

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 28: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 1,

MSKIO (81414 bp)

mi iii nil

34 5 .

I l l l l l l l l l1:1 <)| 1

I! I I! II II I II

1 2

II

I I I13

III)III I

1718

1II

•II

1 •8202122

•1 •Il l l I

23 24

1 -

25 26

1

»a

i

Grail exon

ESTdbhR

Gene

Gene

ESTdbhit

Protein db hitGrail exon

deduced genei

ideutiSei H I T Identity DefiniteMSK10-1MSK 10.2MSKIO.3MSK 10.4MSK10.5MSK10.CMSK10.7

MSK10.8MSK10.9MSK10.10MSK10.11MSK10.12

MSK10.13

MSKIO.14

MSK10.15MSKIO.10MSK10.17

MSK10.18MSK10.19

MSKIO.20MSK 10.21MSK10.22

MSK10.23MSK 10.24MSK10.25MSK 10.20MSK 10.27

4084469574278890

1011211072

1780423430245082538226246

4017705881199579

10C0115536

2193824017248372589727258

27694 30523

33826 36842

377044207847382

5045552783

592045999900859

0493367193730047698279942

387604407349456

5111158870

597216005662767

65295670177410377G7981414

932 «i|6623976|gb|AAF19229.1| 643178 gi|46C2646|sb|AAD2C916.1| 119488 gi|4262158|gb|AAD14458| 243231 Ki|5068781|Kb|AAD4G007.1| 230230 8i|3252818|Kb|AAC24188.1| 202113 Ki|3252818|gb|AAC24188.1| 97

1488 *i|3047071 028

961 *i|3047072 960142 Ki|3047070 14090 Ki|3805759|gb|AAC69115.1] -79

147 gi|30470C9 146248 «i|4773911|Kb|AAD29781.1| 247

800 gi|3047008 805

748 *i|3047073 588

321 Ki|5032274|«b|AAD38222.1364 Ki|4585912|Kb|AAD25573.1691 gi|6539553|dbj|BAA88170.205 gi|40380CC|gb| AAC97247.1| 91

1307 Ki|4263544|Kb|AAD15358.1| 738

205 gi|4038000|gb|AAC97247.1679 gi|30470CC

380 «i|4080179|sb|AAD27547.1| 191133414 gi|3047001 413

84.335.8

97.5

90.3100.040.0

100.091.9

90.4

udo] (AC007505) Similar to Athila ORF 1 A. thaliaIAC000429) hypothetical protein A. tliahana[pseudo] (AC005275) hypothetical protein A. tljaliana[paeudo] (AC007894) F21H2.6 A. thaliana(AC004705) hypothetical protein A. thalituia(AC004705) hypothetical protein A. H.«li»l:a[pseudo] (AF058825) tiimilar to inaine traiicpo^o(GB:M7C978) A. tl,«Ji«n»(AF058825) No definition line found A. tlmlMnu(AF058825) No definition line found A. l);«!i»/;a(AC005693) hypotheticIAF058825) No definition lii(AF147259) contni™ »imil»thetical proteins(AF058825) .i.nilar to III»A. lli«Jian»[p.eudo) (AF058825) conti>proteins A. tlialimia[pfeudoj (AF147264) No definifn

e found Aty to a fa

uila

ly of A. thaliaua hypo-

u*on MuDR (GB:M76978)

ity to retrotran

id A. thalia

i-like

line f(AC006298) hypothetical protein A. thali

51.7 [p«eudo] (AP000836) Sin.ilarto Oyza au.fralra.diB retrotran.posonRIRE1 (D85597) O I J . . « I ™

58.7 [pyeudo] (AC005897) hypotlietical protein A. tdaJiana64.1 [pceudo] (AC000250) putative Atliila retroelement ORFl protein

A. thaliana

37.9 [pseuJo] (AC005897) hypothetical protein A. thaliana100.0 (AF058825) contain* similarity to retrovirus-related TOL polypro-

teius A. thaliana

38.0 [p»eudo] (AF111709) polyprotein Orj.ua mtiva «ulj.p. indici

93.7 [partial] (AF058825) .imilar to A. thaliaua retrotrau«po»oi1 |GB:(L47193) ArakiJopaia thaliaii*

MUF8 (13776 bp)

I I S!

2 34MBK23 — 1 1 i K!FI. 2?

Grail exon

PiLriundb hit

ESTdbhit

Gene

Gene

ESTdbhtt

Protem db hit

Grail exon

deduced genesStltK

[pnrti«l] (Y12776) hypotlietk(AC011622) puttttive di^we(AC005107) puUtive dis.e^e[pseudo] (AC000170) putMtiv

i A l l

1 A. thaliaMUF8.1MUF8.2MUF8.3MUF8.4

30 22C338M 73509160 12C421333C 13770

71110181008

147

Ki|G456160|xb|AAF09148 1|K'I J3757516 Kb AAC04218.1Ki|3738337|i(b|AAC63678.1

509872988

70

51.946.850.162.0

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 29: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

No. 1]

MSN2 (62927 bp)

M l II I I I I I II

•II

1 2 BK1F13 • •

III!I I

9 «

S. Sato et al.

IMIMil

I i l lM «

MUD21

3 4 5 6 7

I II II I I I I1

III III II

59

Gran exonProtandb hit

ESTdbWt

Gene

Gene

EST db hit

Protein db hitGrail exon

deduced gene

identifierMSN2.1MSN2.2MSN2.3MSN2.4MSN2.5MSN2.0MSN2.7

MSN2.8

MSN2.9MSN2.10MSN2.11MSN2.12MSN2.13MSN2.14MSN2.15MSN2.10

Dir ectiou

+--_--

+

+-_++

5*1

52238G12

14716108822070023248

30500

3981142014444234732253130544505807059181

3"109271679662

10909181052197620980

32002

4116443966409004928853960582715874602620

Exoi

82111

13

1

2G442

1G1

10

No. of No. ofEST

Info theSeque ID Definition

[partial] (Z99708) putative protein A. fchahana(Z99708) putative protein A. (Indiana(AC004482) hypothetical protein A. thaliana(AC005724) unknown protein A. thaJiana(AC005724) unknown protein A. tfialiana[partial] (AC005724) unknown protein A. (Indianadolichyl-diphoaphuoligosaccharide protein glycotraiiBfera**2.4.1.119) 50kD yubunit - human(AF077407) contain* .iuiilarity to UDP-glucorono*yl andglucosyl transferase. (Tfam: UDPGT.liimii . .core:A. tnaliana(Z99708) homeodomain protein A. Uialjana(AL132979) protein kina.e ATNl-like protein A. tlialiaiia(Z97341) hypothetical protein A. thalimin(AL132979) sine finger protein A. tfutliana(AC007202) T8K14.10 A. tiujuuu,(AF143940) SWI2/SNF2-like protein A. «ia/ia/;atRNA-Glu(TTC)(AJ001809) succinate dehydrogenase flavoprotein alpha .A. thaliana

0 240 gi|400C88C|emb|CAB10810.10 451 gi|4000880|emb|CABlC81C.l0 321 gi|3152605|gb|AAC17084.1|0 398 gi|4185129|gb|AAD08932.10 408 gi|4185129|gb|AAD08932.13 405 gi|4185129|gb|AAD08932.15 442 gi|C27424|pir||A44054

5 481 gi|3319344|gb|AAC20233.1

228 gi|4000894|emb|CABlC824.1[405 «i|C501975|emb|CAB02441.1[414 (•i|530279C|emb|CAB40038.1|500 gi|C501973|emb|CAB02439.1(247 s i |4835707|gb|AAD30234.1|704 gi|4720079|gb|AAD28303.1[

72 K00193034 gi|3C00471|emb|CAA05025

413309380334372390

51.240.853.937.039.145.845.1

1003402854972427C3

U033

59.055.108.949.447.794.095.C92.1

UDP-85.94)

MUD12 (22601 bp)

IIII

mi IB iu

MY HI 9

1 •2 3

! I I

S 6,1

ma i

VlSNii)

Grail exon

Protein db hit

EST db l it

Gene

Gene

ESTdbhH

Protein db hit

Grail exon

deduced

identifier

ger l e s

D i r eetionI'o fit ion

5 3"NE

o. ofx o n

No. of LeiiKtl:i Inforiiiation iSequenee ID

m. the moot tiiiailjir tfequ.Overlliip Ide »t]ty Detii lition

MUD12.1MUD12.2MUD12.3MUD12.4MUD12.0MUD12.0

MUD12.7

14315120729097921539717322

348454018755147001073318098

19180 22550

499114302115294459

|1200250|emb)CAA02470gi|0225913|«p|O24415|giJ4507210jgb| AAD23031.1|gi[5791483|emb|CAB53527.11gi|0319759|ref|Nr_009841.1|j!i|5903074|gb| AAD55032.1|

1017 Ki|5903073|gb|AAD56031.1|

04278895

52387

53.839.154.243.439.4

(X90990) s tpkl protein kinase So/ariu00. acidic ribo«om»l protein T2B(AC007113) hypothetical protein A. tlujiuia(ALllOllf i) putative protein A. thallium.Mitochoiidrii.1 ribosomal protein MRPL27 (Yin(AC008017) Similar to part of downy mildewR P P 5 A. tluluiui[partial] (AC008017) Similar to di .ea.e reA. tiuJiaru

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 30: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

60 Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 1,

MUL3 (82010 bp)

n innHIM

12

• I I

II I

B i l l ?

6mi

IB I I

« 9IB

1

«1

1

• • • • 1HE

ii main 11111

Grail exon

Protein db hit

ESTdbNt

Gene

Gene

ESTdbhit

Protein db hitGrail exon

deduced genes

identifiel Overlap Identity UefinitiJn"MUL3.1MUL3.2MUL3.3MUL3.4MUL3.0MUL3.6MUL3.7

MUL3.8MUL3.9MUL3.10MUL3.11MUL3.12MUL3.13MUL3.14

50076813

12614159452881841570

51339559745840901201682027444177582

54131061714488229053044845388

55440574155933867736691887080881412

G07G2GGO39010993301271

102839722G1184115435615

Ki|3377007 599 90.0Ki|C143887[gb|AAF04433.1| 281 20.0Ki|4914414|einb|CAB43060.1 1009 70.2si|434700|dbj|BAA04803 112 42.0Ki|4388818|gb|AAD19773.1 049 74.9

Ki|491441G|emb|CAB43667.1| 1024 44.1«i|3022900|Kb|AAC34232.1! 344 47.0Ki|4914417|euib|CAB43668.1 210 03.0gi|4604997|ref|Nrj002303.1| 774 31.0Ki|5809758|einl)|CAA41032.1j 113 59.0Ki|4538928|enib|CAB39064.11 369 53.2ei|453892G|e.nb|CAB39062.1 604 83.8

(AF050020) auxin transport protein EIR1 A tli*li««i,(AC010718) unknown protein A tliulimiu(AL050352) C»2+-tr«nspurtiuK ATPa»e-like protein A thaliana(D212C2) ORF Hum; aapinnr[pseuJo) (AC00C028) putative Pet roele.nent pol polyproteiA. tfiWiuia(AL000352) putative protein A. tliuliitim(AC004411) bypotbetical protein A. tnaiiana(AL000352) putative protein A. tl;ali«,,aDNA ligase IV(X58827) AT-LS1 product A. tlaliana(AL049483) putative protein A. thuiUiiH(AL049483) putative r,bo»pli»tidyl>eriue decarl.oxyla.-e A Huliiu

MWD22 (87180 bp)

w i n i Rim

i t !• i6 7

I I I I I UK

K3K/

I I I mini urn i i n in

6 16 » 19 20

II I I

IB I •3 4 5 8!j i!tl

III III ii mn

I ICM

II 11 i II

21

IIIIII

•n

22

24

mi II

25 26

mi

Grail exonProtein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hitGrail exon

deduced genes

ideutifie] :rlap Identity Defiiiit.it.MWD22.1MWD22.2MWD22.3MWD22.4MWD22.5

MWD22.6MWD22.7MWD22.8MWD22.9MWD22.MWD22.MWD22.MWD22.MWD22.MWD22.

MWD22.MWD22.17MWD22.18MWD22.19MWD22.20MWD22.21MWD22.22MWD22.23

MWD22.24

MWD22.25

MWD22.20

290000037123

12082

13523152001804322797260432707231123340513749838089

53934550475791CG150GG450GG59880890375219783C7

83G47

8GC20

4438G9579398

13143

1511717GC120550252552744129093330893513638298531G8

544G656888G0853G3104G5609G7229720257740581920

8C058

87180

0

530316131305220

2G7379

442

0 3600 2500 2670 1837

1413913882233G8257819

gi!H69544|t-p!P42762|

gi|1172494:t<p|P43335|

k;i|2136800,pir||S598G3gi|3947013|eml.|CAA1946513 i 11175381 |*p|Q09709|^i|5GC8787|yb|AAD4G013.1gi|4220480|j<l->|AADl 2703.1

gi|3540182|<ijl20849?|dbi|BAA07323Ki|5732055|^b|AAD48954.Ki|1504030|dbj|BAA13214,

Ki|26GG93|*p!r29525|i=!;i|4559310|Kb|AAD22979.1|(<i|47G8996|gb|AAD29711.1gi|4185499gij5042171|emb|CAB44090j!!i|4455258|eiIib|CAB36757gi:4455259|cinb|CAB30758.gi 4490304|tmb|CAB38795.

xi G572330|emb|CAB62977

Kil3451321kmbjCAA20438|

gil3445238 einb:CAA18481.1|

124

235113

1271| 25C

347232190

349129204

1161

138169123142

I1 365250796580370

40.4

65.729.8

53.137.038-526.232.5

49.456.245.;22.8

49.G22.947.640.C50.872.057.780.434.2

] ERD1 protein precursorh y p o t h e t i c ! 39.2 kd protein RV2228C

AF094831) iron ^.terin-4-fxlphii-ciirb.ydroxy-tetrnhydr

protein) (PCD)ityA binding protein II - bovine

AL023828) LDNA EST EMBL:M89008 come;- fix.ypothetitcd 44.9 kd protein C18B1102C in cliroiAC007894) F21H2.12 A. th*liMutAC00G069) unknown protein A. thallium

[AC004122) Unknown protein A. tlmlinimD38125) EREBP-4 Nkotim.H tabm-uni

48925..49138] fD8G978ot-mid K12D12(Z49OG9JLEOSINAF129131) put.Ht.ive ZAF140498) hypotheticAF090095) fertilizatioiAL078620) putative pr

AL035G23) putative SeAL035678) putative p[AL09G766) dA59Hl8.2east, worm and plantAL031323) putative troiiiycex pott thepartial] (AL022347) pi

) Hm

c3 bi1 pro-hideot.ein

r/Th

Inuvpred

tlli-LT

ativ

Ltlit e ipe l

A.

r VA.

elctepti

P

o a C.wpiciis

ig prote, Oryx*dent. ;-ethHliHiia

oteiii kthalinitHroteiii

d) proteon or s[

oteiu A

l^fititn j

n

vtttivaed 2 p.x

iinilwrin.) Hoicing frt

tlutlittti

teir

th*.

o hino nttol

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 31: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

No. 1]

MWF20 (91193 bp)

S. Sato et al. #1

I III I

•I I • III II I I lilt

S M'"I«

III •IIIII i •

7 8 0 « H

i n if -3 i nun

i ii r

IN I Il i d ! • • i

Grail exon

Protein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hitGrail exon

deduced gen

identifier

l e s

PositionDirection 5' 3"

No. ofExon

No. ufEST

L Inioieql

FiliationJence ID

on the most: similar :O

=<equelivet-Up

ce

Ide iitity b e h i litionMWF20.1MWF20.2MWF20.3MWF20.4MWF20.0MWF20.GMWF20.7MWF20.8MWF209MWF20.10MWF20.H

MWF20.12

MWF20.13MWF20.14

MWF20.15

MWF20.1C

MWF20.17MWF20.18

MWF20.19MWF20.20MWF20.21MWF20.22MWF20.23

00048703

15023246203117C3078838040412874648900110

00190

66892C8708

70401

72420

7427770109

7823082003830768014089503

7048102001674520313328973092839973432014010100507

60988

6801670210

71710

73816

7434976091

8170082745839688852889077

332516524021024327043CKi

Ki

K'KlH<

«l

|2341034|Kb|AAB70434||3809190|dbj|BAA34390|10024282780347|dbj|BAA24281|2780348]dbj|BAA24282||2780349|dbj|BAA24283|

8i|3770980|emb|CAA09190gi|1903308!Kb[AAB70439

1933 ei|4203G56|gb|AAD15377.1

370201

305

362

73154

131096

gi|1903309|gb|AAB70441iKi|1706714l»[i|ro3070

Biil903360|gb|AAB70442|

Ki|1903300|Kb|AAB70442[

AC000106gi |2341039|eb |AAB70443|

908 gi|3901294

gi|0042037|t;bjAAF20218.1gi|0324710|ref|Nr.014784.1i

479331

523320

113

303

327250

361

357

73137

127383

70.690.794.096.697.390.8

94.004.0

62.060.3

03.0

61.7

94.087.0

00.920.0

(AC000104) F19P19.10 A. tlialiaIAC000104) F19IM9.13 A. tlujiiuio(AB00574C) iuorgimk pliosplmte transputer A. tluJiaiu(U62330) pluoplntte tmiuijurter A. U»lia,,«(AB000094) inorgttnic pbosphHtt triinsporter A t}>nlUi,H(AB000094) morganic phusphatc trHii^porter A. thulium(AB000094) protein plio»plr»t»»e 1 ualnlytic subunit A. tlutluu

[pneudu] IAJ010406) RNA l,elic»s(AC000104) Similar to Nicoti EREBP-3 (gbjD38124).

[iweuduj (AC006136) putative uo.,-LTR retrueleiueut rever»« tra.,-scripti^e A. thaiiaria(AC000104) F19P19.21 A. thalUiiaelectron trai.sfer flavoproteii, bet»-subunit (beta-ETF) (eleclruntran.fer flnvopruteiu .mall subunit) (ETFSS)1AC000104) Similar to AraUilopu's 2A0 lgl.|X83090). ESTgb|T7G913 comes from tbis gene. A. thzlituiH(AC000104) Similar to Anbidopm, 2A0 (gb]X83090). ESTgb|T76913 tomes from tins gene. A. tlujiiuiatRNA-Ala(AGC)(AC000104) Similar to Nicotiar.a lesiou-induciu,. ORF(gb|UCC269). A. tliWisiui(AF089711) rpp8 A. tlujiwia

(AC012390) uukuuwactin-related protein

i A. U,aJia;,«

MWJ3 (42356 bp)

I IHH I. I II• *1 2 3 *

MDf 20 ^ B • •

'• I I I I I I I

Grail exonProtein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protein db hit

Grail exon

deduced s

identifier Dir ectnropition

UH 0 3"NoEx

. of No. uf LEST

eiiKth InforiSeque

iiiitin e e ID

JII the i llOirt sin liUr sequenceOvcrlHp I d e ntity Dehi ijtion

MWJ3.1

MWJ3.2MWJ33MWJ3.4MWJ3.0MWJ3.0MWJ3.7MWJ38MWJ3.9

li

32818500

100291702318400239902702333027

73799730

1704C1819422877200872872030242

04000000

1274 K79 K

649 K

224 K1491 K306 K366 K

1072 K

100304641^1 AAF19002.1112642210i|2240080|einb|CABl0002.1i

i297|emb|CAB30832.1:i|3779020|ii!b|AAC07205.1

»297|emb|CAB36832.1j|4450297|emb|CAB36832.1|:i|4012630|8b|AAD21099.1

91578

548131

1490342342

1002

27.778.521.953.880.900.000,606.8

trtial] (AC004C84) putaA. tiuduuw(AC007190) F23N19.4 A. tluliuu(AF03038C) NOI protein A. tliali.(Z97343) uiyuoin beavy chain like(AL030028) bypotbetical protein(AC005171) putative retroelemen(AL030028) hypothetical protein(AL035528) hypothetical protein(AC004793) Contains reverseTFI00078. A. tliWiaiui

:eptor-like protei

A. tliattit pol pc

A. tlulii

.lyproteir

;riptas

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 32: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

MWP19 (11026 bp)

Mttli! II I

1 2MXHI ; MIK22

Sequencing of Arabidopsis thaliana chromosome 5 [Vol. 7,

GialexonProtein db hit

EST db hit

Gene

Gene

ESTdbhft

Protein db hitGrail exon

identifier DirectioMWT19.1 +

MWT19.2 +

ru , i t ,on

1244

7984

3 '0437

11024

Exo1

1

EST0

0

1731

1013

Sequence 1I>Ki|4309763|Kb|AAD15532.1[

Ki|3779026|Kb|AACC720S.l|

Overlap290

0 1 1

Identity30.1

C3.0

Deft ni tip*eudoA. ttud,,,pseudoA. th*li*

(AC00C217)m a

(ACO0M71)

p

p

t

t

ative

ative

retroele

retroele

ntnt pol

ueut pol

polyp

polyp

rotein

rotein

M X K 3 ( 8 1 4 9 4 b p )

I I I I Illl I I II

7 8

i 11 ii I I I I I I tniitmim i

ne o u 17* »

• •II III III •

III24 292* 27

i inn

9 « e «

• •III

• • I I20 212223

11••nun

i i2839

Gra« exon

Proton db hit

EST db hit

Gene

Gene

ESTdbhtt

Protein db hitGranexon

identifierMXK3.1MXK3.2MXK3.3

MXK3.4MXK3.5MXK3.6MXK3.7MXK3.8

MXK3.10MXK3.11MXK3.12MXK3.13MXK3.14MXK3.15MXK3.1CMXK3.17MXK3.18MXK3.19MXK3.20MXK3.21MXK3.22MXK3.23MXK3.24

MXK3.25MXK3.26MXK3.27MXK3.28MXK3.29

Directioi+

+

_-_++

_

++

-

++

_-

Position

4834185750

1133712249142941927022410

271142908831981345883823541076432234645151222528325614458285609000246164998

6655970013712507303575202

3'18913735

11211

1185813847170281957922481

286433120032677352714144642677456565005852394557595777800335621270376905947

0901070085728867304075085

No.Exo

31

14

20721

2222

1414

181

1249051

131512

EST000

00400

2001

0001031

0000

00400

106651

145380092

8572

479171109

92487334504795391513428344200262316

57873

3172021 1 3

Sequence ID

gi|0091736|gb|AAF03448.1

Ki|0324551|ref|NP_014620.1|ni| 105 2892|<lbj|BAAl 7810Ki|6469125|emb|CAB61744.1AC004392*i|544184|.p|Q06801

gi|3851530

gi|5929906|gb|AAD50036.1

Ki|0466953|xb|AAF13088.1Ki|0502304!emb|CAB02602.1|Ki|1806140|emb|CAA05979|Ki|0587800|Kb| AAF18552.1Ki|4263515|Kb|AAD15341Ki|4507198|nb|AAD23614.1Ki|4903004|dbj|BAA77841.1|«i|4522005|gb|AAD21778.1

ni|3859036M2C108gi|4959108|i!b| AAD34237.1|

Ki|G573707|t<b|AAF17687.1|

ir sequenceOverlap

441

46

341533

2772

400

319

576353447363

02198224310

57774

316

100

Identity57.0

48.9

27.251.782.198.0

64.0

74.7

43.727.475.000.863.576.925.335.4

92.798.6

100.0

44.0

Definition(AC011098) putative betH-1.3-K]uc*tliH*e A. tltaliltiu

(AC010797) putative WRKY-like tran.criptio.u.1

Yol022cp(D90909) ABC transporter Syri«Jio*ys£iH »[>.(AJ275310) hypothetical protein Cicer arietinumtRNA-Leu(AAG)

tionatinx euayme) (D-en«yme)(AF005435) uodulin Giytine/dax

(AF102150) COPl-interactinif protein CIP8 A. llm

(AC009176) unknown protein A. thaiiniw(AL133421) putative protein A. tluJiuia(X97314) cdc2MwC Medicago Bativa(AC012680) putative mitodiondrial carrier protein(AC004044) hypothetical protein A. thuJimia(AC0071G8) putative GTP-bindinK protein A. tha/i(AB021981) UDP-N-acetylKluco»aniine transporter[paeudo] (AC007069) putative non-LTR retroelemei

(AF095453) asparaKine synthetase A. tJialianatRNA-ArxlACG)(AF083914) annexin A. tluliuui

(AC009243) F28K19.24 A. tluliaiw

r.gulator pro-

P P

lituia

A. Uialiana

•aita

Homo sapiensit reverse tran-

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 33: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

No. 1]

MYN8 (54528 bp)

S. Sato et al. 63

I I II • IlllIII t

I I! I

K19E1 • I • HM Wt

lilMIHB IIIu iiirai

inn7 8

51-t i I I I

P/NC6

GranexonProtein db hit

ESTdbhK

Gene

Gene

ESTdbhtt

Protein db hitGrail exon

deduced genes

identifieiIufoi itlieSequence ID Identity Definitk

MYN8.1MYN8.2MYN8.3MYN8.4MYN8.0MYN8.6MYN8.7MYN8.8

8700144441838424443299054127901258

8804162272392428731330005038753149

1320C

1

4040

10n

35205

10401181070

2216338

Ki|4972120|emb|CAB43977.11si|3860270|t!bjAAC73044.1gi|4450349|eiiib|CAB30759.11«i|4455350jemb|CAB3C700.1|(Ci|123530|»p|r04929|8i|23170C|»p|r29018|gi|417073|»p|Q034C0|Ki|4972109|emb|CAB439CC.l|

34262976

1752201

332

88.005.857.3

30.778.272.4

(AC005(AL035(AL035

cell diviglutrtllU(AL078

824) unknown protei524) putative protei]524) putative proteii

istiuii control protein*te yynthase precursi579) putative acyl-C

in A. thaiianai A. thuliwtui A. tluJijtn*

2 homolog 1jr (NADH-GOGAT)oA binding protein A. thai

M Z N l ( 8 1 6 7 2 b p )

I D I I M i l l I I I I l l l I • I I I I I II I I II II 11 I I I

II I

I i I O i l ill

- t - •+•

II i l8 9t> n

• II •H 1 -4-

19 21

• I

i i III ii mini

K19M22

12 13 14 161718 20 22 2324 28

n iIII I I I Mi I I Ii: I

i • in m inn ii II mi II mi i

Grail exonProtein db hit

ESTdbhit

Gene

Gene

ESTdbhit

Protem db hit

Grail exon

dedi

ident

iced

iher

ger l e s

b i rPoniti

ectlono n

5' 3'N o . of No. of

ESTLengt]Ii Infori

Sequenatin e e

on ui

lb[i the i IlOft »'l milar »eq

U v i

uencesrlap Ideiitity b eh inition

MZNl.lMZNl.2MZNl.3MZNl.4MZNl.iMZN1.CMZNl.7MZNl.8MZNl.9MZNl.10MZNl.11

MZNl.12MZNl.13

MZNl.15MZN1.1CMZNl.17

MZNl.18MZNl.19MZNl.20MZNl.21MZNl.22MZNl.23MZNl.24MZN1.25MZNl.20

11607140191715318M02095428557330173454830107

3905842848

47745

024380407709017

000370348400342089947030072907740017820880G33

124301031217338202772079030377339283409738397

413494C028

50050

040900707209922

013490404108204700907118173708758057891981207

833517308248

62402

109560710450

352

13008704gi|1053859|dbj|BAAl8709|«i|4185131|gb|AAD08934.1|

Ki|3420050|gb| AAC31851.il«i|3935138|gb| AAC80581.1|*i|4587989|gb| AAD25930.1|Ki|5541697|emb|CAB51202.

gi|6522600|enib|CAB61965.1|

561 gi |902923 |dbj |B AA075471

909 gi| 1771381 |emb|CAA65127|

597 gi|1771381|emb|CAA65127|

204 gi|6180043|gb|AAF05700.1|519 gil3201477lemblCAA06808.ll302 gi|3695403

386 Ki|&051764|etnb|CAB45057.1|576 gi|2134979|pir||I38909310 gi|3242704|«b|AAC23756.1|225 Kij3242704|^b| AAC23756.1|226 «i|3242704|gb| AAC23756.1|302 «i|3242704|gb| AAC23756.1|101 «iJ0522597|eiiib|CAB019G2.1|188 gi|5031275|xb|AAD38143.1|

153153101

343490092103

30.034.4

100.0

09.030.247.979.8

18400009

309322220224220270100182

100.080.447.1

37.031.047.082.773.004.907.702.3

(AF049230) unknown A. thali*(D90917) hypothetical protein Synechocyvfui BJJ.(AC005724) putative RING zinc finger protein A. thalitut*(U62742) Ran binding protein 1 homolo* A. thalim,*

(AC004680) hypothetical protein A. thalitui*(AC005106) T25N20.2 A. HuJiana(AF085279) hypothetical Cy^-3-His zinc finger protein A. thuiiiuui(AL096800) putative protein A. tlialitui*

(AL133292) l-aminocyclopropane-l-i;iirboxylic acid oxida*e-likeprotein A. bhHlitui*(D38544) phocphoinoisitide specific phonpholipafe C A. tItalian*(X95877) phocphuiuoyitide-specific phoopholipafe C JVicotiana rus-

(X95877) pliosphoiuoyitide-specific phocpholipane C NkotituiH rutt-tita(AF192490) cyclophilin A. t/iaiiaiia(AJ000021) putative PRLl a^ociated protein A. thxli*i,x(AF090373) contains similarity to the pfkB family of carbohydratekiuavctf (Pf»m: PF00294: E=1.6e-75) A. tlutlimiu

(AL078637) putative protein A. tliuliau*damage-specific DNA binding protein 2 - hunmn(AC003040) hypothetic*! protein A. tlialuut*AC003040) hypothetical protein A. thalianaAC003040) hypothetical protein A. thalintuiAC003040) hypothetical protein A. th*luuiaAL133292) RNA binding-like protein A. thalum*AF139496) unknown Primus armertiaca

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018

Page 34: Structural Analysis of Arabidopsis thaliana Chromosome 5. X ...

Downloaded from https://academic.oup.com/dnaresearch/article-abstract/7/1/31/389236by gueston 26 March 2018