Introduction to Gene Mining Part B: How similar are plant and human versions of a gene? After...

Post on 28-Dec-2015

218 views 2 download

Transcript of Introduction to Gene Mining Part B: How similar are plant and human versions of a gene? After...

1

Introduction to Gene Mining

Part B: How similar are plant and human versions of a gene?

After completing part B, you will demonstrate

How to use NCBI BLASTp and www.Araport.org data to determine whether Arabidopsis thaliana

and human muscle protein genes and gene products are homologous.

The Arabidopsis Information Portal is funded by a grant from the National Science Foundation (#DBI-1262414)

and co-funded by a grant from the Biotechnology and Biological Sciences Research Council (BB/L027151/1).

These lessons were developed during the summer of 2015 as education outreach for the www.Araport.org portal in

conjunction with the J. Craig Venter Institute, Rockville, MD, 20850, USA.

Contact informationGeneral information: araport@jcvi.org

Jason Miller, Grant Co-Principal Investigator, JCVI jmiller@jcvi.org

This lesson was prepared by Andrea Cobb, Ph.D. (adcobb@fcps.edu)

with the help of Margot Goldberg (mgoldberg1@pghboe.net)2

In Part A, our sample question was:

Can we study your muscle disease using a plant model?

3

We used the NCBI portal to find names of human muscle genes.

4

We also found the function of human actin-alpha 1 gene ( ACTA1) and asked “ Might plants need that same function?”

5

.

We used NCBI BLASTn tosearch in Arabidopsis thaliana

for genes which align to human ACTA1

6

We learned that “alignment” is achieved by using an algorithm that maximizes local matches between two

sequences.

7

We learned how to use the BLASTn report scores with Query cover, Ident and the E-values to choose a

statistically meaningful alignment.

8

Explain--Gene Discovery Scorecard

In a group of 3-4 students, examine your gene discovery scorecard and then:

Infer characteristics of genes which were in both A. thaliana and humans.

Identify characteristics of genes present in humans but not found in plants.

9

Explain What information so far indicates whether or not plants have animal muscle genes?

What additional information might you need to be certain whether or not plants have animal muscle genes?

10

Part B: Evaluating homology- How similar are plant and human versions of a gene?

11

Recipes handed down often change

12

Which parts of the recipes were conserved (were almost the same) in all generations’ recipes?

Which parts were not conserved?13

Reasons why a recipe might be changed

• Discuss in groups and report your ideas.

14

How might you track the passage of a recipe from one generation to the next if you can’t ask the cooks?

?

15

How is a gene like a recipe?

• Discuss in groups and report your ideas.

16

What features of a gene might

make it a version of another

gene?

Record your answers.

https://www.youtube.com/watch?v=gCxrkl2igGY is a song you might remember.

17

• What is homology?

• What criteria do scientists use to classify particular genes and their protein products as homologs?

Explore

18

• Homology- a general term describing 2 or more genes which share an ancestral gene

• How might recipes be “homologous”?

19

To use a plant model for my patient’s disease, I need to find a

plant homolog to his ACTA1 gene. We found that the Arabidopsis

thaliana ACT7 gene is a version, but is it similar enough to be a

homolog?

20

Should we search for homologs using a gene sequence or a protein sequence?

21

The structure of a eukaryotic gene is complex!

The amino acid sequence of the protein is more likely to be

conserved than the gene sequence

Translation (protein synthesis)

http://nitro.biosci.arizona.edu/courses/EEB600A-2003/lectures/lecture24/lecture24.html

22

A BLASTp using the gene product’s amino acid sequence is likely to find protein homologs

A BLASTn might find more differences than similarities

23

We will use a protein BLAST tool, BLASTp, to find homologous proteins. We need to first find the protein sequence coded by the human ACTA1

gene on the NCBI protein page.

24

From the ACTA1 protein information page, select FASTA, then copy and paste the amino acid sequence into a Word Document.

>gi|49168518|emb|CAG38754.1| ACTA1 [Homo sapiens]MCDEDETTALVCDNGSGLVKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQSKRGILTLKYPIEHGIITNWDDMEKIWHHTFYNELRVAPEEHPTLLTEAPLNPKANREKMTQIMFETFNVPAMYVAIQAVLSLYASGRTTGIVLDSGDGVTHNVPIYEGYALPHAIMRLDLAGRDLTDYLMKILTERGYSFVTTAEREIVRDIKEKLCYVALDFENEMATAASSSSLEKSYELPDGQVITIGNERFRCPETLFQPSFIGMESAGIHETTYNSIMKCDIDIRKDLYANNVMSGGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWITKQEYDEAGPSIVHRKCF

Each amino acid is represented by a particular letter

25

Navigate to the BLASTp link on NCBI.

26

Paste the protein sequence for ACTA1 here.

Enter Arabidopsis thaliana for the search database.

Select blastp and then click on the BLAST button.

27

The BLASTp report is similar to the BLASTn report.

Query sequence

28

“Descriptions” shows 4 actins with the same query coverage, E-value and Ident!There appear to be 4 possible homologous proteins but which is most similar to the human ACTA1 protein?

29

There are a number of actin proteins with high Query coverage, very low E-values and high identity. Check them all (for some whose numbers are represented more than once, check the first listing). Then select “Multiple Alignment” to directly compare those sequences.

30

Conserved amino acids are shown in red. Which differences can you find quickly?

Can you spot a deletion? Where is an amino acid replaced by a chemically

similar type?Where is an amino acid replaced by a chemically

different type? 31

Protein sequence homology is analyzed by constructing a Distance tree of results. Check the desired

“hits”, then select “Distance tree”.

32

Query—human ACTA1 protein

Nodes represent a shared ancestral gene

These proteins are all homologs.

33

34

Of the proteins in Arabidopsis thaliana, ACT7 has the highest identity (88%) and lowest E-value (0.0) when compared to human ACTA1.

A gene tree program predicts the presence of ancestral genes between ACT7 and ACTA1.

Is that sufficient to confirm protein homology for experimental modeling?

35

A more restricted alignment between human ACTA1 and the closest 3 Arabidopsis proteins can check that ACT7 is the protein

closest to the ancestral gene.

Check Align two or more sequences, then copy and past protein sequences for ACT7, ACT8 and ACT2 into Subject Sequence box.

36

Multiple alignment results for human ACTA1 protein and the 3 closest Arabidopsis proteins.

37

What do the distance tree results indicate?

38

Do you have enough data to use Arabidopsis ACT7 gene as a model for the human ACTA1 gene?

Discuss and report your ideas.

39

What criteria from published work indicated that these plant processes and human diseases involved

homologous genes or proteins ?

40

Homologous proteins will have:

• Very low E-values for sequence alignment(< .00001)

• >25% conserved sequences for >100 aa* • Protein-protein interactions of one homolog which

are similar to protein-protein interactions of the other homolog

• Similar co-expression of genes for each homolog • Similar Function Gene Ontology (GO terms) • Conserved sequences and protein domains*

http://jura.wi.mit.edu/bio/education/hsteachers2012/form_blast_intro.pdf

41

Let’s find homology information and data about the Arabidopsis ACT7 gene in http://www.Araport.org Use the pull-down menu to access the ThaleMine tool.

42

Enter information about your gene of interest, in this case, ACT7

43

Results show 1 gene, 2 articles and 1 mRNA in the database.

We are only interested in studying the gene for now, so we will select the category –Gene or just select the identifier for the gene from the list at right

44

This is the Gene information sheet for the Arabidopsis thaliana ACT7 gene. How did the function listed under Curator Summary compare to your

previous prediction?

45

The blue bar under Curator Summary has tabs that take you quickly to that section down the page. Click on the Homology tab.

Links to information about human ACT7 homologs.

46

Homologous proteins will have:

• Very low E-values for sequence alignment• (< .00001)• >25% conserved sequences for > 100 aa* • Protein-protein interactions of one homolog which

are similar to protein-protein interactions of the other homolog

• Similar co-expression of genes for each homolog • Similar Function Gene Ontology (GO terms) • Conserved protein domains

* http://jura.wi.mit.edu/bio/education/hsteachers2012/form_blast_intro.pdf 47

Compare the first (human ACTA1) and second (Arabidopsis ACT7) sequences in each alignment and it is evident that many more than 25% of any 100 amino acids in any of the regions align.

48

Homologous proteins will have:

• Very low E-values for sequence alignment• (< .00001)• >25% conserved sequences for > 100 aa* • Protein-protein interactions of one homolog which

are similar to protein-protein interactions of the other homolog

• Similar co-expression of genes for each homolog • Similar Function Gene Ontology (GO terms) • Conserved protein domains

* http://jura.wi.mit.edu/bio/education/hsteachers2012/form_blast_intro.pdf 49

Actin interacts with many proteins

https://www.youtube.com/watch?v=FzcTgrxMzZk 50

ACT7 and ACTA1 proteins each interact with a variety of other proteins. Because the same protein may have a plant name and a different animal name, further investigation is needed to

know from this data whether ACTA1 and ACT7 are interacting with identical proteins.

Arabidopsis ACT7 interacts with these proteins

Human ACTA1 interacts with these proteins

51

Homologous proteins will have:

• Very low E-values for sequence alignment• (< .00001)• >25% conserved sequences for > 100 aa* • Protein-protein interactions of one homolog which

are similar to protein-protein interactions of the other homolog ??

• Similar co-expression of genes for each homolog • Similar Function Gene Ontology (GO terms) • Conserved protein domains

* http://jura.wi.mit.edu/bio/education/hsteachers2012/form_blast_intro.pdf 52

Co-expression (transcription of 2 or more genes at the same time in the same cell) is required for gene products (proteins) to work together.

http://www.frontiersin.org/files/Articles/96150/fpls-05-00426-HTML/image_m/fpls-05-00426-g001.jpg

In the image above, two differently colored fluorescent proteins are co-expressed in Arabidopsis.

53

What genes are co-expressed (same time, same location) for ACT7 or ACTA1?

Arabidopsis ACT7is co-expressed with these genes

Human ACTA1 co-expression is shown with purple lines.

54

Scientists would need to confirm that the different plant and animal names were actually the same protein.

Homologous proteins will have:

• Very low E-values for sequence alignment• (< .00001)• >25% conserved sequences for > 100 aa* • Protein-protein interactions of one homolog which are

somewhat similar to protein-protein interactions of the other homolog ??

• Some similar co-expression of genes for each homolog ??

• Some similar Function Gene Ontology (GO terms) • Conserved protein domains*

http://jura.wi.mit.edu/bio/education/hsteachers2012/form_blast_intro.pdf

55

Gene Ontology provides information about biological process, molecular function and cellular location –are

any ACT7 GO terms similar to human ACTA1 GO terms?Arabidopsis ACT7

Human ACTA1

56

Homologous proteins will have:

• Very low E-values for sequence alignment• (< .00001)• >25% conserved sequences for > 100 aa* • Protein-protein interactions of one homolog which are

somewhat similar to protein-protein interactions of the other homolog ??

• Some similar co-expression of genes for each homolog ??

• Some similar Function Gene Ontology (GO terms) • Conserved protein domains*

http://jura.wi.mit.edu/bio/education/hsteachers2012/form_blast_intro.pdf

57

58

Homologous proteins will have:

• Very low E-values for sequence alignment• (< .00001)• >25% conserved sequences for > 100 aa* • Protein-protein interactions of one homolog which are

somewhat similar to protein-protein interactions of the other homolog ??

• Some similar co-expression of genes for each homolog ??

• Some similar Function Gene Ontology (GO terms) • Conserved protein domains*

http://jura.wi.mit.edu/bio/education/hsteachers2012/form_blast_intro.pdf 59

Members of the Arabidopsis actin family of genes are homologous with each other. Does that mean that the Arabidopsis actins are

homologous with human ACTA1? 60

Arabidopsis actin gene ACT7 plays an essential role in germination and root growth

The Plant JournalVolume 33, Issue 2, pages 319-328, 16 JAN 2003 DOI: 10.1046/j.1365-313X.2003.01626.xhttp://onlinelibrary.wiley.com/doi/10.1046/j.1365-313X.2003.01626.x/full#f2

Wild-type, no ACT7 mutation

Mutant ACT7+

Wild-type, no ACT7 mutation

Mutant ACT7+

We have an ACT7 mutant with an observable phenotype difference compared to the normal wild type.

61

Have we found a suitable plant research model for nemaline myopathy?

What additional information would you want? Scientific literature searches for Arabidopsis information are easy to access in http:www.Araport.org apps 50 years of Arabidopsis research!

62