FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High-...
-
Upload
angelina-lee -
Category
Documents
-
view
216 -
download
1
Transcript of FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High-...
FLOCK: A Density-Based Clustering Method for Automated Cell Population Identification in High-
Dimensional Flow Cytometry Data and the Cell Ontology
Richard H. Scheuermann, Ph.D.
Department of Pathology and Division of Biomedical Informatics
U.T. Southwestern Medical Center, Dallas, TX
TRADITIONAL FLOW ANALYSIS
Flow Cytometry (FCM)
• a.k.a. Fluorescence Activated Cell Sorting (FACSTM)
• Method:– Stain cell population with fluorescent reagents that bind to specific
molecules, e.g. fluorescein-conjugated anti-CD40 antibodies
– Measure fluorescence properties of each cell using flow cytometer
• Direct and indirect measurement of individual cell characteristics, e.g. cell size, membrane protein expression, secreted protein expression, cell cycle state, DNA ploidy, signal transduction activation
Uses of Flow Cytometry (FCM)
• Differences in cell populations between specimens
• Study of normal cell activation, differentiation and function
• Study of abnormal cell activation, differentiation and function
• Isolate cells from mixture based on their molecular characteristics
• Diagnostics - leukemia, lymphoma, myeloproliferative disorders
• Novel biomarkers
10 10 10 10 100 1 2 3 4
A-07-3711 LAKHANPAL04.008CD15 FITC ->
Red - MyeloblastsGreen - GranulocytesL. Blue - Monocytes
normal leukemia
FCM can measure many parameters simultaneously, e.g., BD LSR-II can produce data for up to 19 parameters for every cell in
a given sample
FCM instrumentation & reagents
Traditional Flow Cytometry Analysis
•Subjective
•Time-consuming
•Doesn’t handle overlapping distributions well
•Sensitive to slight difference in fluorescence intensity distributions between samples
•Requires at least one 2D plot that clearly segregates populations in question
Goal - group together cells with similar characteristics
Traditional approach - manual gating 2D at a time
Improved Approach
• Identifying cell populations automatically, objectively, and quickly in multi-dimensional flow cytometry data (eliminate manual gating)
• Quantitatively compare the identified populations across different samples and across different experiments
Characteristics of FCM Data
Data sets are:• Large (and various) size
– From hundreds to millions of events• Multidimensional
– 19 parameter instrument already available• Noise and Outlier
– Dead cells and dirt
Populations are different in:• shapes
– Elongated, ellipsoid, spherical, banana shapes…• densities
– Some cell populations are relatively sparse even on 2D space• compositions
– Events that pile up on axis can change data distribution• positions
– Some are very close while others are far away• sizes
– From several events to hundreds of thousands events
FLOCK APPROACH
Grid-based Clustering Approach
• Divide n-dimensional space with hyper-grids
• Identify dense hyper-regions
• Merge neighboring dense hyper-regions to define k populations
• Determine centroids of each population
• Cluster data using k centroids to seed
2D example
Divide with hyper-grids
Find dense hyper-regions
Merge neighboring dense hyper-regions
Clustering based on region centers
FLOCK v2.0 STEPS
1. File Conversion - Convert binary .fcs file into a data matrix
2. Data Cleansing - Remove boundary events (noise) in FSC and SSC dimensions
3. Data Shrinking - Collapse data toward distribution modes
4. Normalization - Z-score normalization for values in each dimension ((x i - µ)/SD)
5. Dimension Selection - Select most informative dimensions based on measures of dispersion and
distortion
6. FLOCK LoDi. Partition each dimension to generate a hyper-gridii. Identify dense hyper-regions in hyper-gridiii. Merge neighboring dense hyper-regions to define hyper-region groups (n)iv. Determine centroids for each hyper-region groupv. Use n centroids to seed single round of distance-based clustering
7. FLOCK HiD - Refine population definition based on histogram partitioning
8. Group Merging - Merge close hyper-region groups based on [distance metric]
9. Centroid Calculation - Compute centroid for each hyper-region group
10. Clustering - Cluster events to nearest centroid
11. Population statistics - Summarize population proportions, intensity levels, etc.
12. Visualization
Data
• Source: University of Rochester (Sanz)
• Normal human PBMC sample stained with:– FITC‑IgD– PE‑CD1c– PE‑Alexa610‑CD24– PE‑Cy5‑IgG– PerCP‑Cy5.5‑CD3– PE‑Cy7‑B220– PacificBlue‑CD38– PacificOrange‑Aqua dead cell staining– APC‑CD27– APC‑Cy7‑CD19
• 10 color; 12 parameter
• Gated on CD19+, CD3- (~67,000 events)
N1-3
UM1-2
UM3-4PB GSM
GNSM
DNM
CD
27
IgD
B2
20
CD24
CD
38
IgG
A
17 B Cell Populations in Blood
B2
20
CD24
CD
38
IgG
N1(B220+, CD38+)
N2(B220+, CD38-)
N3(B220low, CD38+)
Naïve B cells (CD27low, IgD+, IgG-)
Population characteristics
Populationa Colorb CD27c IgDc IgGc CD38c CD24c B220c Proportiond Putative cell typea
N1 Gray - + - + int + 48.94% naïve (CD38+)[Bm2?]N2 Magenta - + - - + + 4.69% naïve (CD38-)N3 Purple - + - + + low 4.41% naïve (CD38+B220low)
UM1 Darkred + + - + + + 1.55% unswitched memory (CD38+)UM2 Salmon + + - - + + 0.94% unswitched memory (CD38-)[Bm1?]UM3 Darkblue + int - + + low 6.16% IgDlow unswitched memory (CD38+)UM4 Green + int - - + low 11.50% IgDlow unswitched memory (CD38-)
GSM1 Grayishgreen + + + + + + 0.36% switching memory (IgD+IgG+CD38+)GSM2 Yellow + - + + + low 4.05% switched memory (CD38+)[early Bm5?]GSM3 Blue + - + - + low 4.40% switched memory (CD38-)[late Bm5?]
GNSM1 Cyan + - - + + low 4.84% IgD-IgG- memoryGNSM2 Darkgreen + - - - + low 3.84% IgD-IgG- memoryGNSM3 Teal + - - + + + 1.30% IgD-IgG- memoryGNSM4 Orange + - - - - low 0.51% IgD-IgG- memory
DNSM1 Pink - - + - - + 0.85% double negative memory (IgG+)DNSM2 Darkgray - - - - - + 0.91% double negative memory (IgG-)
PB Red high - - high - low 0.75% plasmablasts
Summary Statistics
B cell component of the Cell Ontology
http://www.obofoundry.org/
Tube Marker Summary
Tube 26 Tube 27 Tube 28 Tube 29 Tube 30 Tube 31 Tube 33
Major PBMC subsets and FcE
T cell subsets
NK & T cells
Naïve TH
Memory TH
T cell subsets
NK cells
FL1 CD14 CD4 CD4 CD4 CD4 CD4 CD56
FL2 CD23 CCR3 CD25 CD25 CD25 CXCR3 CXCR3
FL3 CD3 CD8 CD3 CD3 CD3 CD8 CD3
FL4 CD19 CCR4 CD161 CD45RA CD45RO CCR5 CCR5
Tube 26 - CD19 vs CD3
T
B
CD19
CD3
Ontology Schematic
Normal 2324
0 102 103 104 105
<FITC-A>: IgD
0
103
104
105
<A
PC
-A>
: C
D27
1.6717.2 7.79
65.79.29
0 102 103 104 105
<FITC-A>: IgD
0
103
104
105
<A
PC
-A>
: C
D27
0 102 103 104 105
<FITC-A>: IgD0 102 103 104 1050 102 103 104 105
<FITC-A>: IgD
0
103
104
105
<A
PC
-A>
: C
D27
0
103
104
105
0
103
104
105
<A
PC
-A>
: C
D27
1.6717.2 7.79
65.79.29
1.671.6717.217.2 7.797.79
65.765.79.299.29
Percentage (%)
Population & ID Color Code CD27 IgD CD21 CD38 CD24 B220 CXCR3 2324
1 PB red CD27 high IgD- CD21low CD38+ CD24- B220low CXCR3low 3.11
2 CD27+ cyan CD27+ IgD- CD21+ CD38- CD24+ B220+ CXCR3+ 5.95
6 Memory magenta CD27+ IgD- CD21+ CD38- CD24+ B220low CXCR3- 4.37
9 blue CD27+ IgD- CD21low CD38- CD24- B220low CXCR3- 1.14
4 CD27- gray CD27low IgD- CD21- CD38- CD24- B220low CXCR3- 0.91
8 memory pink CD27low IgD- CD21- CD38low CD24- B220+ CXCR3- 2.28
13 darkblue CD27low IgD- CD21- CD38- CD24- B220+ CXCR3+ 1.98
5 green CD27- IgD- CD21+ CD38- CD24low B220low CXCR3- 0.47
12 darkgreen CD27- IgDlow CD21+ CD38- CD24+ B220low CXCR3- 1.01
3 unswitched yellow CD27+ IgDlow CD21+ CD38- CD24+ B220low CXCR3- 9.12
14 memory purple CD27+ IgDlow CD21- CD38low CD24+ B220+ CXCR3- 0.29
7 naive darkGray CD27+ IgD+ CD21+ CD38low CD24low B220+ CXCR3- 20.47
10 grayish green CD27low IgD+ CD21+ CD38+ CD24+ B220+ CXCR3- 3.79
11 darkred CD27- IgD+ CD21+ CD38- CD24low B220+ CXCR3- 45.09
Marker Expression
B cells from Immgen
UT SouthwesternYu (Max) QianJamie LeeMegan KongJennifer CaiJie HuangNishanth MarthandanDiane XiangYoung Bun KimPaula GuidryEva Sadat
Ignacio Sanz (Rochester)Chungwen Wei (Rochester)Tim Mosmann (Rochester)Adam Seegmiller (UTSW)Nitin Karandikar (UTSW)Christine Martens (Emory)Chris Ding (UTA)
Alex Diehl (Jackson Labs)Martin Zand (Rochester)
Supported by NIH N01AI40076 and N01AI40041
Northrop GrummanJohn CampbellLiz ThompsonJeff WiserMike Attasi
Immune Tolerance NetworkDave ParrishKeith BoyceTom CasaleJeff Bluestone
Acknowledgments