Nonparametric Bayesian Models With Focused Clustering for ...mnd13/JSMTalk.pdfCommon strategies:...

Introduction Methodology Simulation Studies American National Election Survey Analysis Conclusions

Nonparametric Bayesian Models With FocusedClustering for Mixed Ordinal and Nominal Data

Maria DeYoreo

Department of Statistical ScienceDuke University

Joint work with Jerry Reiter and D. Sunshine Hillygus

Research supported by the National Science Foundation under award SES-11-31897

Joint Statistical Meetings, 2015

1 / 19


Introduction

• Surveys often consist of ordinal and nominal categorical data

• Typically missing values due to item nonresponse

• Common strategies: fully model-based Bayesian inference, and multi-ple imputation (MI) in advance of likelihood-based or survey-weightedinference on completed data sets (Rubin, 1987; Little and Rubin, 2002)

• Log-linear models: difficult to specify with large numbers of variables,and ignore information in ordering of categories

2 / 19


Seeking More Flexible Models

? Log-linear model with all two-way interaction terms is insufficient incapturing {party, vote, ideology} in the American National ElectionStudy (ANES); similar features in American Community Survey

• Multivariate probit model for ordinal data is restrictive due to underlyingMVN distribution (i.e., single polychoric correlation)

• Nonstandard distributional forms and complex relationships

• Dirichlet process (DP) mixture modeling techniques

• Mixtures of multivariate ordered probit models and mixtures of indepen-dent categorical distributions (product-multinomials; Dunson and Xing,2009)

3 / 19


DP Mixture Modeling

• DP mixture models: consistency and full support

• In practice, they may concentrate on fitting the distribution of a subsetof variables at the expense of others

• Highlighted in regression setting (X large compared to Y), and for mixeddata (Banerjee et al., 2013; Wade et al., 2014; Murray and Reiter, 2015)

• Especially true for product-kernels (standard in modeling categoricaland disparate data types) due to need to capture dependence throughclustering

• Unnecessarily large number of clusters→ poor predictive inference

4 / 19


Focus and Remainder Variables

• Missing data: Model may concentrate on fitting the marginal distribu-tions of the nearly or fully complete variables at the expense of the vari-ables with high fractions of missing data

? Exactly the opposite of what we want

• Related, analysts may seek to estimate the joint distribution of a subsetof variables in a database as accurately as possible, using the others tohelp predict missing values

? Solution: allow analysts to split variables into groups of focus variablesand remainder variables

• Single set of clusters for the remainder variables and two sets for the fo-cus variables (one for ordinal and one for nominal); induce dependencethrough a hierarchical prior (ITF prior; Banerjee et al., 2013)

5 / 19


Data

• n individuals measured on pc ordered categorical variables Y and pn

nominal variables X

• Let Z be latent continuous random vector, assume Yij = l iff γj,l−1 <Zij ≤ γj,l, for j = 1, . . . , pc and l = 1, . . . ,Lj, for fixed γ

• Continuous data incorporated by viewing only part of Z as latent

• Split (Y,Z) and X into focus variables referenced by set A, and re-mainder variables referenced by set B; denote by (Y(A),Z(A),X(A)) and(Y(B),Z(B),X(B))

• H(ZA)i , H(XA)

i , and H(B)i label the ith individual’s mixture component for

(Y(A),Z(A)), X(A), and (Y(B),Z(B),X(B)), respectively

6 / 19


DPMM-FC Model

Data Model:

(Z(A)i | Z(B)

i ,Xi,H(ZA)i = r, {βh}, {Σh})

ind∼ NpAc(D(Z(B)i ,Xi)βr,Σr), i = 1, . . . , n

(X(A)ij | H(XA)

i = s, {ψ(j)h })

ind∼ categ(ψ(j)s ), i = 1, . . . , n, j = pc + 1, . . . , pc + pAn

(Z(B)i | H(B)

i = t) ind∼pc∏

j=pAc+1

N(Z(B)ij ;µtj, σ

2tj), i = 1, . . . , n

(X(B)ij | H(B)

i = t, {φ(j)h })

ind∼ categ(φ(j)t ), i = 1, . . . , n, j = pc + pAn + 1, . . . , p

Dependent cluster labels:

Pr(H(ZA)i = r,H(XA)

i = s,H(B)i = t | Hi = h) = π

(ZA)r,h π

(XA)s,h , π

(B)t,h , i = 1, . . . , n,

r = 1, . . . ,N(ZA), s = 1, . . . ,N(XA), t = 1, . . . ,N(B)

Pr(Hi = h) = πh, i = 1, . . . , n, h = 1, . . . ,N

+ Stick-breaking weights π; priors and hyperpriors7 / 19


Model Properties

• Pr(H(ZA)i = r,H(XA)

i = s,H(B)i = t) =

∑Nh=1 πhπ

(ZA)r,h π

(XA)s,h π

(B)t,h

• X(A) and (X(A),X(B)) follow mixtures of product-multinomials; can cap-ture any multivariate categorical distribution

• Model for Y(A) is a mixture of probit regressions; also very flexible

• Local dependence between Y(A) and (Y(B),X(B)) in regression meanfunction and via the clusters

• Dependence in X(A) and (Y(B),X(B)) captured mostly through clusters

8 / 19


Simulations

• 8 scenarios from three binary factors related to rate of missingness of A(30% and 5%), dimension of A (|A| = |B| and |A| < |B|), and samplesize (500 and 3000)

• 50 simulated data sets in each scenario

• Generate complete data from a series of GLMs including two and three-way interaction terms and randomly blank values

• Fit the DPMM-FC model with D(·) encoding only main effects to im-pute m = 10 completed data sets

• Also fit a model which does not distinguish between focus and remaindervariables (DPMM-Mix; Murray and Reiter, 2015)

• Estimate all marginal and bivariate probabilities with point estimates andconfidence intervals

9 / 19


Main Simulation Results

• DPMM-FC estimates distributions of focus variables especially well andprovides coverage near the nominal 95% level

• reliable inferences for the distributions among the remainder variables

• Estimates of probabilities based on joint distribution of focus and re-mainder variables are generally decent, but coverage is below the nomi-nal 95% level

? DPMM-FC performs better than DPMM-Mix on focus variables (par-ticularly nominal-nominal), and is more stable in terms of quality ofestimates for remainder variables

? DPMM-Mix performs better on focus-remainder (A,B) relationships par-ticularly when A is nominal

10 / 19


Data and Modeling Approach

• 2012 ANES (pre-presidential election): interested in measurements re-lated to voting behavior, ideology, candidate preference

• Ideology (ordered 7-point scale) missing at 28% rate, 2008 candidatepreference (nominal) missing at 35% rate, Tea Party support (ordered7-point scale) missing at 17% rate . . .

? only 333 of n = 2054 individuals have complete data

• Estimate DPMM-FC on 20 variables, including variables missing at highrates and those of substantive importance in A, and the mostly observeddemographics and attitudinal variables in B

• Survey-weighted inference for finite population quantities after creatingm = 10 multiple imputations

11 / 19


2012 Candidate Preference

• Goal in pre-election surveys is to identify the subset of electorate thatwill vote and on predicting their candidate preferences; thus we focuson analyses involving vote intent in 2012

• Key substantive question: Could Obama hold onto Independents?

• Pr(vote 12 | party=Ind, prefer Obama 08) suggests that while most Inde-pendents who preferred Obama in 2008 intended to vote for him againin 2012, many said they were not going to vote

→ Obama did not lose Independents to Romney but many planned to stayhome in the 2012 election.

12 / 19


Vote intent in 2012 conditional on 2008 preference:0.0

0.2

0.4

0.6

0.8

1.0

2012 cand. pref.

Pr(v

ote

2012

| O

bam

a 20

08)

Obama Romney Ind not voting0.0

0.2

0.4

0.6

0.8

1.0

2012 cand. pref.

Pr(v

ote

2012

| M

cCai

n 20

08)

Obama Romney Ind not voting

0.0

0.2

0.4

0.6

0.8

1.0

2012 cand. pref.

Pr(v

ote

2012

| ot

her 2

008)

Obama Romney Ind not voting

• Account for missingness → greater stability in preferences on the Re-publican side than the Democratic side

• Pr(Obama 2012 | Obama 2008)≈ 0.64 with a significant proportion say-ing they will not vote; Pr(Romney 2012 |McCain 2008)≈ 0.75

13 / 19


Exploring Voter Decision Making

• Logistic regression with “prefer Obama in 2012” as the binary responseand explanatory variables including main effects and all two-way inter-actions for preference in 2008, party, ideology (liberal, moderate, con-servative), and opinion of Tea Party (oppose, none, support)

• Ideology not significant so remove it

• Overall the explanatory variable effects are as expected, but interactioneffects give insight into voter decision making

• Compute predicted probabilities of voting for Obama for each of the 27combinations of 2008 preference, party, and Tea Party support

14 / 19


0.0

0.2

0.4

0.6

0.8

1.0

Dem

2008

Pr(v

ote

Oba

ma

2012

)

N M O0.0

0.2

0.4

0.6

0.8

1.0

Rep

2008

Pr(v

ote

Oba

ma

2012

)

N M O

0.0

0.2

0.4

0.6

0.8

1.0

Ind

2008

Pr(v

ote

Oba

ma

2012

)

N M O

Figure: Predicted probability of preferring Obama in 2012 for Democrats (left), Republicans(middle) and Independents (right), by 2008 preference (N=neither, M=McCain, O=Obama) andTea Party support (x=oppose, +=support, and •=neither).

? Tea Party support not strongly related to 2012 pref. for partisans whopreviously voted along party lines: Obama Democrats and McCain Re-publicans; however it is predictive of 2012 vote among Obama Republi-cans and Independents.

15 / 19


0.0

0.2

0.4

0.6

0.8

1.0

Dem

2008

Pr(v

ote

Oba

ma

2012

)

N M O

0.0

0.2

0.4

0.6

0.8

1.0

Rep

2008Pr

(vot

e O

bam

a 20

12)

N M O

0.0

0.2

0.4

0.6

0.8

1.0

Ind

2008

Pr(v

ote

Oba

ma

2012

)

N M O

• Two groups of Independents: those who behave much like partisan iden-tifiers (“closet partisans”) and are not really “up for grabs” in the cam-paign; in contrast the Independents who are actually “in play” in theelection are those who are ambivalent or cross-pressured

? Tea Party support is important for those cross-pressured

16 / 19


Contributions and Future Work

• Flexible joint model for mixed data

• Useful for dealing with missing data in large-scale surveys containingmany categorical variables

• Proposed separating variables into groups based on rate of missingnessand importance in subsequent analysis

• More research needed into determining, in a given problem, which vari-ables should be placed in A and B

• Future Work: What to do if some variables C are completely observed?Wasteful to include them in the joint model!

• ANES application: compare inferences from the face-to-face interviewsand the Internet survey

17 / 19


THANK YOU!

18 / 19


References

A. Banerjee, J. Murray and D. Dunson (2013). “Bayesian Learning of JointDistributions of Objects,” in Proceedings of the 16th Intl. Conf. on ArtificialIntelligence and Statistics.

D. Dunson and C. Xing. (2009), “Nonparametric Bayes Modeling of MultivariateCategorical Data", Journal of the American Statistical Association, 104, 1042–1051.

L. Hannah, D. Blei, W. Powell. (2011), “Dirichlet Process Mixtures of GeneralizedLinear Models,” Journal of Machine Learning Research, 1, 1–33.

R. Little and D. Rubin. (2002), Statistical Analysis with Missing Data, New York:Wiley.

J. Murray and J. Reiter. (2015), “Multiple Imputation of Missing Categorical andContinuous Values via Bayesian Mixture Models with Local Dependence”,http://arxiv.org/pdf/1410.0438v1.pdf.

D. Rubin. (1987), Multiple Imputation for Nonresponse in Surveys, New York: JohnWiley and Sons.

S. Wade, D. Dunson, S. Petrone, and L. Trippa. (2014), “Improving Prediction fromDirichlet Process Mixtures via Enrichment”, Journal of Machine Learning Research,15, 1041–1071.

19 / 19

Nonparametric Bayesian Models With Focused Clustering for ...mnd13/JSMTalk.pdfCommon strategies:...

Documents

Transcript of Nonparametric Bayesian Models With Focused Clustering for ...mnd13/JSMTalk.pdfCommon strategies:...