A Big Thanks Prof. Jason Bohland Quantitative Neuroscience
Laboratory Boston University
Slide 6
The Process Construction and representation of the Anatomic
Gene Expression Atlas (AGEA).
Slide 7
Allen Reference Atlas
Slide 8
3D Nissl volume comes from rigid reconstruction Each section
reoriented to match adjacent images as closely as possible A 1.5T
low resolution 3D average MRI volume used to ensure reconstruction
is realistic Reoriented Nissl section down-sampled, converted to
grayscale Isotropic 25m grayscale volume.
Slide 9
Anatomy 208 large structures and structural groupings extracted
Projected & smoothed onto 3D atlas volume to for structural
annotation Additional decomposition of cortex into an intersection
of 202 regions and areas
Slide 10
The Process Construction and representation of the Anatomic
Gene Expression Atlas (AGEA).
Slide 11
InSitu Hybridization or ISH Each gene ISH series is
reconstructed from serial sections (200 m spacing) Coronal section
Sagittal section
Slide 12
Why ISH ? Phenotypic properties in cells result of unique
combination of expressed gene products Gene expression profiles
=> define cell types.
Slide 13
6 genes on 1 brain Each gene on 56 sections 2 sections are for
Nissl
Slide 14
8 genes on 1 brain Each gene on 20 Sections.
Slide 15
ISH Tissue Preparation & Imaging Process Sectioning
Staining (Non-isotopic digoxigenine (DIG)) Washing Imaging
Slide 16
ISH Probe Preparation
Slide 17
Traditional Approach vs. ISH Histology One gene at a time For
20,000 genes need 20000 x (5 or 14) slides ~1year DNA microarrays
& SAGE - Applied to large brain region Cannot differentiate
neuronal subtypes Kamme, F et. al. J. Neurosci (2003) Sugino, K.
et. al. Nature Neurosci (2006) in situ hybridization measures
expression & preserves spatial information for single gene
Finer resolution cellular but not single cell Data can be used to
analyze Gene expression Gene regulation CNS function (spatial)
Cellular phenotype (spatial)
Slide 18
Reproducibility For multiple genes, inbred mouse strain used
Although different mice used for different genes, expression for
under same environmental conditions are reproducible.
Slide 19
Is ISH Reproducible? Primary Source of variation comes from
Riboprobes Day-to-day variability Biological variability in brains
Still with inbred mice, variation between brains is
significant.
Slide 20
Processing Expression StatisticsReconstruction 3D Data accessed
by standard coord system 200^3 m voxels Ontology of Allen Reference
Atlas used to label individual voxels
Slide 21
Grid Based Nearest Plane
Slide 22
Registration - Key Volumes iteratively registered to AB atlas
using affine and locally nonlinear warping Registration good to
~200 microns Local deformation field example
Slide 23
Slide 24
3D Annotation
Slide 25
Lower dimensional data volumes Analyze binned expression
volumes at 200 m 3 resolution ~31,000 image series (mostly single
hemisphere, sagittal series) 4,104 unique genes available from
coronally sectioned brains Each volume is 67 x 41 x 58 voxels
(about 50k brain voxels) Comparable to fMRI resolution
Slide 26
Data normalization Background correction & Registration
Intensity normalization Correct background from negative control
Registration - Map the image to the reference atlas Smoothed
Expression Energy Sum of intensities of expressing cells / # of
cells in the voxel An average over many cells of diverse types
Slide 27
ISH Signal (c) Coronal plane in situ hybridization (ISH) image
of gene tachykinin 2 (Tac2) from the Allen Brain Atlas showing
enriched expression in the bed nucleus of the stria terminalis
(BST). The box represents a 1-mm2 square. (d) Enlarged expression
mask view of boxed area in c depicting gene expression levels color
coded by ISH signal intensity (red, higher expression level;
green/blue, lower expression level).
Slide 28
Measurements p is a image pixel in voxel C |C| is the total
number of pixels in C M(p) - expression segmentation mask 1
(expressing pixel) or 0 (non expressing pixel) I(p) grayscale value
of ISH image intensity Gray = 0.3*Red + 0.59*Green +
0.11*Blue.
Slide 29
Per Gene Signature Prox1 Coronal section Sagittal section Prox1
volume maximum intensity projections Raw ISH Expression Energy
Slide 30
Expression measures expression density = sum of expressing
pixels / sum of all pixels in division expression intensity = sum
of expressing pixel intensity / sum of expressing pixels expression
energy = sum of expressing pixel intensity / sum of all pixels in
division == density x intensity Recap - Measurements
Slide 31
MetaData Each voxel can be connected to a node in a
hierarchical brain atlas / ontology, and also to Waxholm space Raw
Nissl sections from the same brain (with 200 m spacing) can also be
obtained Each gene has specific probe sequence used, various
identifiers to link to gene information (weve used Entrez ID)
Slide 32
Deriving Insights
Slide 33
Large-scale data analysis How much structure is present across
space and across genes? How would the brain segment on the basis of
gene expression patterns (as opposed to Nissl, etc.)? Is there
structure in the patterns of expression of highly localized genes?
What can we learn from the expression patterns of genes implicated
in disorders? see Bohland et al. (2009) Methods; Ng et al. (2009)
Nature Neuroscience.
Slide 34
Genome-wide Analysis of Expression 70.5% genes expressed in
less than 20% cells
Slide 35
Notes Well-established genes for different cells identified For
12 major brain regions, 100 top genes.
Functional Compartments Genes with regional expression provides
substrates for functional differences
Slide 39
Tools from AGEA Correlation mode View navigate 3-D spatial
relationship maps Clusters mode Explore transcriptome based spatial
organization Gene Finder mode - Search for genes with local
regionality
Slide 40
Slide 41
Expression energy for each gene (M=4,376) and for each voxel
(N=51,533) For each voxel find Pearsons correlation coefficient
between seed voxel and other voxel using expression vectors of
length M Compute 51,533 three-dimensional correlation maps Web
viewer for easy navigation between maps and within each 3-D map
Correlation values as 24-bit false color using a blue-to-red (jet)
color scale Spatial Transcriptome
Slide 42
Slide 43
Clusters of Correlated Gene Expression Classical definition of
brain regions Overall Morphology Cellular Cytoarchitecture
Ontological Development Functional Connectivity
Slide 44
Slide 45
Hierarchical clustering Voxels are spatially organized as a
binary tree Each node is collection of voxels and has 0 or 2
branches Initially 51,533 voxels assigned to root node of the tree.
Final tree has103,065 nodes with a maximum depth of 53 levels and
51,533 leaf nodes (one for each voxel in the brain). At each
bifurcation an ordering is assigned to each child to enable the
definition a global depth first ordering for all leaf nodes.
Clusters of Correlated Gene Expression
Up regulated genes Down regulated genes Differentially
Regulated Genes
Slide 51
Clusters ?
Slide 52
Clustering Analysis Group genes that show a similar temporal
expression pattern. Group samples/genes that show a similar
expression pattern.
Slide 53
Finding groups of objects such that the objects in a group will
be similar (or related) to one another and different from (or
unrelated to) the objects in other groups Inter-cluster distances
are maximized Intra-cluster distances are minimized Clustering
Analysis
Slide 54
Clusters ? How many clusters? Four ClustersTwo Clusters Six
Clusters
Slide 55
Clustering Algorithms K-means and its variants Hierarchical
clustering
Slide 56
K-means Clustering Partitional clustering approach Each cluster
is associated with a centroid (center point) Each point is assigned
to the cluster with the closest centroid Number of clusters, K,
must be specified The basic algorithm is very simple
Slide 57
Choosing Initial Centroids
Slide 58
Limitations - Differing Sizes Original Points K-means (3
Clusters)
Slide 59
Limitations : Differing Density Original Points K-means (3
Clusters)
Slide 60
Limitations : Non-globular Shapes Original Points K-means (2
Clusters)
Slide 61
Hierarchical Clustering Produces a set of nested clusters
organized as a hierarchical tree Can be visualized as a dendrogram
A tree like diagram that records the sequences of merges or
splits
Slide 62
Agglomerative Clustering More popular hierarchical clustering
technique Basic algorithm is straightforward Compute the proximity
matrix Let each data point be a cluster Repeat Merge the two
closest clusters Update the proximity matrix Until only a single
cluster remains Key operation is the computation of the proximity
of two clusters Different approaches to defining the distance
between clusters distinguish the different algorithms
Slide 63
In The Beginning... Start with clusters of individual points
and a proximity matrix p1 p3 p5 p4 p2 p1p2p3p4p5......... Proximity
Matrix
Slide 64
Intermediate Step After some merging steps, we have some
clusters C1 C4 C2 C5 C3 C2C1 C3 C5 C4 C2 C3C4C5 Proximity
Matrix
Slide 65
Intermediate Step We want to merge the two closest clusters (C2
and C5) and update the proximity matrix. C1 C4 C2 C5 C3 C2C1 C3 C5
C4 C2 C3C4C5 Proximity Matrix
Slide 66
After Merging The question is How do we update the proximity
matrix? C1 C4 C2 U C5 C3 ? ? ? ? ? C2 U C5 C1 C3 C4 C2 U C5 C3C4
Proximity Matrix
Slide 67
Inter-Cluster Similarity p1 p3 p5 p4 p2 p1p2p3p4p5.........
Similarity? MIN MAX Group Average Distance Between Centroids
Proximity Matrix
Slide 68
Inter-Cluster Similarity p1 p3 p5 p4 p2 p1p2p3p4p5.........
Proximity Matrix MIN MAX Group Average Distance Between
Centroids
Slide 69
Inter-Cluster Similarity p1 p3 p5 p4 p2 p1p2p3p4p5.........
Proximity Matrix MIN MAX Group Average Distance Between
Centroids
Slide 70
p1 p3 p5 p4 p2 p1p2p3p4p5......... Proximity Matrix MIN MAX
Group Average Distance Between Centroids Inter-Cluster
Similarity
Slide 71
p1 p3 p5 p4 p2 p1p2p3p4p5......... Proximity Matrix MIN MAX
Group Average Distance Between Centroids
Hierarchical Clustering: Group Average Nested
ClustersDendrogram 1 2 3 4 5 6 1 2 5 3 4
Slide 75
Complexity: Time & Space O(N 2 ) space since it uses the
proximity matrix. N is the number of points. O(N 3 ) time in many
cases There are N steps and at each step the size, N 2, proximity
matrix must be updated and searched Complexity can be reduced to
O(N 2 log(N) ) time for some approaches
Finding enriched genes Seeding with known structure-specific
genes. Oligodendrocyte (Mbp, Mobp, Cnp1) Choroid-plexus (Col8a2,
Lbp, Msx1) Find the genes with similar expression patterns.