RNA-seq analysis of the impacts of aging in zebrafish

15
RNA-SEQ ANALYSIS OF AGING IN ZEBRAFISH Nhi Hin Project coordinators: Steve Pederson & Michael Lardelli

Transcript of RNA-seq analysis of the impacts of aging in zebrafish

PowerPoint Presentation

RNA-SEQ ANALYSIS OF AGING IN ZEBRAFISHNhi HinProject coordinators: Steve Pederson & Michael Lardelli

Hi everyone. My name is Nhi Hin, and my summer project involved doing some analysis on RNA-seq data from zebrafish brains.The aim was to uncover genes that might be involved in processes related to aging in the brain.Its desirable if we can understand more about brain aging, since it's the major risk factor for neurodegenerative diseases, like Alzheimers disease.Although zebrafish might seem very different from humans, theyre also vertebrates, and this means that their cell biology and developmental processes are similar to those of mammals including humans. Consequently, we might expect that any processes involved in zebrafish brain aging might also be involved in human brain aging.1

RNA Sequencing (RNA-seq) on whole brainsAlzheimers Disease Genetics Laboratory

So here, Im just going to quickly explain the data I worked with.In total, there were 12 wildtype zebrafish brains.These brains came from zebrafish that could be divided into two age groups: young fish, which were 6 months old, and aged fish, which were 24 months old.Within each age group, the zebrafish were divided into two tanks. The zebrafish in each tank are siblings of each other, so we might expect that their gene expression is more similar to each other compared to zebrafish in other tanks. With these 12 zebrafish brains, RNA-sequencing was done on each brain. 2

RNA-Seq Data Processing PipelineQuality checking with FastQCAdapter trimming with AdapterRemovalQuality checking with FastQC (again)Alignment to the reference zebrafishGenome (GRCz10) with HISAT2Gene quantification with FeatureCountsTranscript quantification with salmon and kallisto

After RNA-sequencing was done, the raw data was processed through a pipeline. This part of the project was done by Tim and double-checked by Steve from the Bioinformatics Hub, but Ill just quickly summarise the process here.First, the reads were quality checked,The adapters were trimmed,The trimmed reads were quality checked again,The reads were then aligned to the reference zebrafish genomeThe number of reads mapping to each gene locus were then quantified,And lastly, the number of reads mapping to each gene transcript were quantified. 3

Ensembl Gene IDp-valueadj. p-valueENSDARG000000087063.83E-104.14E-06ENSDARG000000807451.58E-098.59E-06ENSDARG000000830631.44E-086.26E-05ENSDARG000000873281.31E-070.0003212ENSDARG000001034912.63E-070.0004744ENSDARG000000852492.01E-070.0003966ENSDARG000000765732.32E-104.146E-06ENSDARG000000893825.16E-070.0007976ENSDARG000000842093.75E-070.0006256ENSDARG000000402001.42E-098.598E-06ENSDARG000000819381.27E-060.0016286ENSDARG000000827891.21E-060.0016286ENSDARG000000873371.97E-070.0003966ENSDARG000000944892.03E-060.0022070ENSDARG000000853381.72E-060.0020641

Visualisation with glimmaDifferential gene expression analysis comparing young and aged fish, using limma package in R95 differentially expressed genes

After running through this pipeline, Tim did an analysis to identify the genes which were differentially expressed between young and aged fish.The analysis suggested that 95 genes were differentially expressed between the young and aged fish.However, one really cool thing that I learned from Steve, was that the differentially expressed genes could be visualised in a more intuitive way using the glimma package from R. So I might just give a very quick demo of the glimma interface right now.

Alright, so this is the glimma interface.Here is the list of genes. We can sort them by ascending p-value, to look at the differentially expressed genes.When you click on a gene, the plot over on the right changes, and you can see the difference in the expression of that gene between the samples in the different tanks and across different ages.All of the genes are also represented on this plot on the right, which shows the relationship between the fold change and counts of the genes.The differentially expressed genes are shown in red.4

Differential Gene Expression Findings There were 95 differentially expressed genes between young and aged fish (logFC > 1; adj. p-value < 0.05). Using a Fishers exact test, non-coding genes (e.g. lincRNA, snRNA, scaRNA) appeared to be over-represented in the differentially expressed genes. Consistent with observations from whole-rat-brain RNA-seq analysis (Wood et al. 2013) and changes in splicing that occur with human brain aging (Mazin et al. 2013; Holoch & Moazed 2015).

In total, there were 95 genes that appeared to be differentially expressed between young and aged fish.Using a Fishers exact test, it seemed that non-coding genes appeared to be over-represented in the differentially expressed genes, suggesting that they might play important roles in aging processes.This is consistent with some other studies.A whole-rat-brain RNA-seq analysis, very similar to the data I used, except with rats, also found an over-representation of non-coding genes. In addition, there is also evidence that changes in splicing occur with human brain aging, and that some of these changes are mediated by non-coding RNAs.5

A summary of what I did Check that the coverage of reads looked normal (just in case artifacts were responsible for some of the differentially expressed genes). Gene ontology enrichment analysis Differential transcript expression analysis

So just to clarify again, the RNA-seq data processing and differential gene expression analysis from before were done by Tim and Steve from the Bioinformatics Hub. So what did I work on?

First, I backtracked a bit to check that the coverage of reads mapping to each gene looked normal.This is important, because artifacts are capable of making a gene appear to be differentially expressed, even when it isnt in reality.

Secondly, I did a gene ontology enrichment analysis to look at possible biological processes that might differ between the young and aged fish.

Lastly, I did a differential transcript expression analysis. This is important if we want to see whether aging might result in differential splicing of genes.6

1 - Visualising read coverage of differentially expressed genesGviz was used to visualise read coverage from the .bam files (containing sequence alignment data).

Shiny was used to make the Gviz plot interactive.Conclusion: Read coverage looks normal.

So first, visualising the read coverage of differentially expressed genes.Remember, this is important to ensure that the differentially expressed genes are likely to be from a true biological effect and not from artifacts or sequencing errors.

I used the following packages in R: Gviz, to visualise the read coverage, and Shiny, to make the Gviz plot interactive. The overall conclusion was that the read coverage appeared normal.

Now Im just going to quickly show you the interface I made.This was my first time using Shiny by the way, so please dont expect anything flashy.

Alright, so here is the interface.You can select a gene from this list here. This list is just the list of differentially expressed genes from Tims analysis.Here, you can select the range of the y-axis, which represents the number of reads mapped. You might notice it goes into the negatives, and that happens if transcription occurs on the reverse strand. After you select a gene, the Gviz plot should show up pretty soon.

Alright, so its finished loading. Sorry its so slow, by the way. On the left, we have some basic information about the gene. At the right, we have the Gviz plot.Theres four main plots over here.Each plot corresponds to one tank, so A, B, C and D.And each plot actually contains 3 lines, as you can see, and each line corresponds to one sample.

For normal coverage, we would expect each sample to have an overall similar read coverage pattern across the gene.Its OK if theyre not exactly the same in intensity, since the size of each library might be slightly different.We also expect peaks at the exons, as they are transcribed, and minimal transcription along introns, which is what can be seen here.

If we have abnormally large peaks in a small section of the gene, this indicates that many reads are mapping there, suggesting artifacts that might be caused by sequencing errors or errors in PCR amplification. Its those cases which would be worrying, because then those artifacts might cause the gene to be differentially expressed, even though it really isnt.

Fortunately, read coverage looked normal for the genes in this analysis.7

2 Gene ontology enrichment analysisGoana() from the limma R packageTermDescriptionNumber of genesNumber of DE genesP-valueGO:0009416response to light stimulus9345.64E-05GO:0007623circadian rhythm4331.25E-04GO:0004896cytokine receptor activity4531.43E-04GO:0009314response to radiation12541.78E-04GO:0048511rhythmic process5232.20E-04GO:0009648photoperiodism1122.73E-04GO:0042752regulation of circadian rhythm1122.73E-04GO:0000982transcription factor activity, RNA polymerase II core promoter region sequence-specific binding6033.37E-04GO:0001637G-protein coupled chemoattractant receptor activity1424.49E-04GO:0004950chemokine receptor activity1424.49E-04

Overall, in aged fish: Inflammatory response Unsaturated fatty acid biosynthesis Metabolism (Consistent with Wood et al. 2013)

Secondly, I did the gene ontology enrichment analysis between the young and aged fish.

The purpose of this analysis was to get insight into biological processes that might be affected by aging.

Very briefly, the ontology of a gene is a set of keywords and phrases that give us insight into the genes potential function. The analysis basically involves finding the terms for each gene, and then seeing which terms are over-represented, or enriched, in the genes that were differentially expressed between the young and aged fish.

Here are some of the gene ontology terms that were enriched.

You can see here, that the Goana output gives the GO term, along with the number of genes and differentially expressed genes which were tagged with the term, and a p-value for the term's enrichment in the differentially expressed genes. In summary aged fish appeared to have an:Increased inflammatory responseDecreased unsaturated fatty acid biosynthesis, and Decreased metabolism

Which sound reasonable for aging, although I admit they are a bit vague. These results were also consistent with a similar study that looked at whole-brain RNA-seq data from aged and young rat brains.8

An aside: Why use Entrez Gene IDs?

Ensembl Gene ID:ENSDARG00000074384slc12a8Using biomaRt and GO.db, I retrieved the GO terms associated with each gene. First using the genes Entrez ID, then with the genes Ensembl ID.Entrez Gene ID:100000006

Just as an aside, When I used goana to do the gene ontology analysis, I was wondering why Entrez gene IDs had to be supplied.The genes in my data were originally named with Ensembl gene IDs, so I wondered why these couldnt be used instead.

You might remember, from the previous slide, that for each GO term, goana also gave the number of genes that were tagged with that GO term?Well, I tried to see if this number could be verified by manually getting the GO terms with biomart and GO database.

First, I used entrez gene IDs to grab the GO terms from biomart, and then did the same with ensembl gene IDs. However, unexpectedly, the list of GO terms associated with ensembl gene IDs was significantly shorter

This seemed to suggest that less GO terms were mapped to Ensembl gene iDs.9

Entrez IDNumber of GO TermsEnsembl IDNumber of GO Terms1000000064ENSDARG0000007438441000000099ENSDARG00000075648NA1000000268ENSDARG00000074791NA1000000344ENSDARG00000103466410000004411ENSDARG00000062672NA1000000587ENSDARG00000056652NA1000000739ENSDARG0000009038791000000791ENSDARG0000007501511000000831ENSDARG00000089519NA1000000853ENSDARG0000005550431000000952ENSDARG0000010005221000000981ENSDARG0000009720311000001013ENSDARG0000005350931000001044ENSDARG0000006246741000001222ENSDARG0000000398421000001251ENSDARG0000002773811000001312ENSDARG0000007446921000001329ENSDARG0000007737191000001352ENSDARG00000078139NA1000001405ENSDARG0000005581351000001489ENSDARG0000005331591000001511ENSDARG00000091513NA10000016015ENSDARG00000069105NA1000001658ENSDARG0000010529681000001903ENSDARG0000008913831000001996ENSDARG00000041165NA1000002192ENSDARG0000010197521000002231ENSDARG0000002095211000002301ENSDARG0000007974511000002426ENSDARG0000004097161000002529ENSDARG00000034409NA1000003033ENSDARG00000059950310000032215ENSDARG00000014496151000003251ENSDARG0000009904911000003261ENSDARG000000695251

Missing GO terms from some Ensembl gene IDs when using biomaRt?

In fact, you can see this right here. This is a table showing the number of GO terms associated with each gene.On the left, Ive got the Entrez gene IDs, and on the right, the equivalent Ensembl gene ID for the same gene.Surprisingly, I found that some Ensembl gene IDs did not seem to be associated with any GO terms, according to BiomaRt.However, when using the Entrez gene ID, all GO terms were retrieved.This was a bit strange, as using the Ensembl ID on the Ensembl website brought up all of the GO terms. It was only when using the Ensembl IDs with biomart through R that resulted in these missing GO terms like you see here.

10

3 Differential Transcript Expression Analysis

A

B

C

D

A

B

D

A

C

DExonIntronAlternative splicing produces different transcripts from the same gene. Looking at differential transcript expression patterns might revealchanges in splicing between young and aged fish.

Alright, so the last part of my project involved doing a differential transcript expression analysis.Just to briefly summarise, alternative splicing may produce different transcripts from the same gene locus. Doing a differential transcript expression analysis would allow us to see possible changes in splicing between the young and aged fish. In contrast, differential gene expression, which was done earlier, might not be able to detect these changes if the overall expression level of that gene was similar across aged and young fish.11

Ensembl Transcript IDp-valueq-valueEnsembl Gene IDENSDART00000163764.17.80E-122.87-07ENSDARG00000104790ENSDART00000010211.65.62E-111.03E-06ENSDARG00000008706ENSDART00000152464.23.55E-104.35E-06ENSDARG00000017984ENSDART00000157321.24.03E-093.70E-05ENSDARG00000046157ENSDART00000135249.19.33E-095.72E-05ENSDARG00000076573ENSDART00000138088.17.79E-095.72E-05ENSDARG00000068745ENSDART00000062885.42.56E-088.67E-05ENSDARG00000042845ENSDART00000145449.12.33E-088.67E-05ENSDARG00000025581ENSDART00000131063.32.37E-088.67E-05ENSDARG00000089877ENSDART00000140694.12.77E-088.67E-05ENSDARG00000092115ENSDART00000064032.42.82E-088.67E-05ENSDARG00000043608ENSDART00000101841.31.87E-088.67E-05ENSDARG00000058454ENSDART00000167145.13.07E-088.68E-05ENSDARG00000031496ENSDART00000162337.13.59E-088.81E-05ENSDARG00000104293ENSDART00000164726.13.39E-088.81E-05ENSDARG00000104305

List of differentially expressed transcripts and the genes which they map to (170 unique genes):Ensembl Gene IDp-valueadj. p-valueENSDARG000000087063.83E-104.14E-06ENSDARG000000807451.58E-098.59E-06ENSDARG000000830631.44E-086.26E-05ENSDARG000000873281.31E-070.0003212ENSDARG000001034912.63E-070.0004744ENSDARG000000852492.01E-070.0003966ENSDARG000000765732.32E-104.146E-06ENSDARG000000893825.16E-070.0007976ENSDARG000000842093.75E-070.0006256ENSDARG000000402001.42E-098.598E-06ENSDARG000000819381.27E-060.0016286ENSDARG000000827891.21E-060.0016286ENSDARG000000873371.97E-070.0003966ENSDARG000000944892.03E-060.0022070ENSDARG000000853381.72E-060.0020641

List of 95 differentially expressedgenes (Tims analysis earlier):(To produce this list, I used salmon for transcript quantification, wasabi to convert salmon output for use with sleuth, sleuth for differential transcript analysis)

Using tools like salmon, wasabi and sleuth, we can generate a list of the transcripts which are differentially expressed between young and aged fish.This list is shown on the left. The blue column shows the gene which the transcript maps to.

On the right is a list of the differentially expressed genes identified from Tims analysis earlier.

One question we might be interested in asking, is how much overlap exists between the lists of differentially expressed transcripts and differentially expressed genes?

12

1254550Overlap between the genes mapping to differentially expressed transcripts, and differentially expressed genes

Heres a very quick representation between the genes mapping to differentially expressed transcripts, and differentially expressed genes. We can see that there are 170 transcripts which are differentially expressed between young and aged fish, as shown by the blue circle, which is over twice the number of differentially expressed genes, shown as the green circle. 45 of the differentially expressed transcripts overlap with differentially expressed genes.

Unfortunately, this is pretty much where I got up to before I ran out of time, so I havent yet had the chance to explore what these genes might be, or any relationships between them.

13

Future WorkRetrieve the genes associated with the enriched GO terms and look for possible relationships or interactions between these genes. Repeat the differential transcript expression analysis in limma to ensure the statistical model is the same as that used in Tims differential gene expression analysis.Explore the differentially spliced genes (transcripts) that occur with aging in more detail.

So Im just going to conclude with some questions that still need to be answered in future work.

14

ReferencesHoloch, D. and Moazed, D., 2015. RNA-mediated epigenetic regulation of gene expression.Nature Reviews Genetics,16(2), pp.71-84.

Mazin, P., Xiong, J., Liu, X., Yan, Z., Zhang, X., Li, M., He, L., Somel, M., Yuan, Y., Chen, Y.P.P. and Li, N., 2013. Widespread splicing changes in human brain development and aging.Molecular Systems biology,9(1), p.633.

Wood, S.H., Craig, T., Li, Y., Merry, B. and de Magalhes, J.P., 2013. Whole transcriptome sequencing of the aging rat brain reveals dynamic RNA changes in the dark matter of the genome.Age,35(3), pp.763-776.