Determinantal Point Processesmachinelearning.math.rs/Zecevic-DPP.pdf · 2019. 10. 23. · Each...

Determinantal Point Processes

Anđelka Zečevićandjelkaz@matf.bg.ac.rs

Determinantal Point Processes (DPPs)Problem of interest: How can we select a subset of a given set that is of a good quality and diversity at the same time?

Sources:Alex Kulesza & Ben Taskar, 2013: Determinantal Point Processes for Machine Learning

Alexei Borodin, 2009:Determinantal Point Processes

Motivation - Information Retrieval

Motivation - Recommender Systems

Motivation - Extractive Text Summarization

Discrete Point Processes● Number of items (videos, sentences, …): N● Ground set: set of indexes

● Number of subsets: 2N

● Point process:

Discrete Point ProcessesIndependent Point Process

Each element i included with probability pi

DPP - marginal kernel

Let K = [Kij] be and N x N real, symmetric matrix with properties: - K is a positive semidefinite matrix xTKx ≥ 0 for all x∈RN

all principal minors of are non-negative all eigenvalues are non-negative

- all eigenvalues are bounded above by 1

Matrix K is so called marginal kernel.

Intuition: Kij represents similarity of the items i and j

DPP - marginal kernelIf A ⊆ than KA= [Kij]i,j∈A is the matrix with rows and columns indexed by the elements of the set A.

DPP - definitionPoint process is called determinantal point process if for a random subset drawn according to for every holds

Adaptation:

Notation: Y ~ DPP(K)

DPP - diversificationFor A = {i}:

For A = {i, j}:

DPPs have ability to diversify!

DPP - diversification

DPP propertiesRestriction

If Y is distributed as a DPP with marginal kernel K and then Y ∩ A is DPP with marginal kernel KA

Complement

If Y is distributed as DPP with marginal kernel K, - Y is distributed as a DPP with marginal kernel I-K.

ScalingIf K = ɣK′ for some 0≤ɣ<1, then for all we have det(KA) = ɣ|A|det(K′A)

L-ensembleLet L = [Lij] be and N x N real, symmetric positive semidefinite matrix.

An L-ensemble is a point process that satisfies

Normalization constant:

For we have

Borodin & Rains, 2005

DPP and L-ensemble● An L-ensemble is a DPP with marginal kernel K given by

● Not all DPPs are L-ensembles! When any eigenvalue of K achieves the upper bound of 1, the DPP is not an L-ensemble!

DPP and L-ensembleIf

is an eigendecomposition of L then

is eigendecomposition of K.

Sampling Algorithm

Sampling Algorithm ComplexityAlgorithm complexity: O(Nk3) k = |V|

Gram-Schmidt orthonormalization: O(Nk2)

Bottleneck: eigendecomposition of L with complexity O(N3)

There exists exact and approximating DPP samplings with lower complexity.

Derezinski et. al, 2019

Sampling Algorithm - Visualization

NP HardnessKo et. al (1995): Finding the set Y that maximizes is NP-hard.

is a log-submodular function and can be optimized in polynomial time.

Submodularity

Question of DiversityL can be decomposed as L = BTB for some D x N, D ≤ N Let Bi be the i-th column of B

Question of QualityColumns of B are vectors that represent items in the set .

Question of Quality

Similarity matrix S:

Question of Quality and Diversity

Decomposition for large setsDual representation:

C = BBT is DxD real symmetric positive-semidefinite matrix

Decomposition for large setsDual representation:

Decomposition for large setsRandom projection:

Dimension D of diversity features can be large. Idea: Project diversity vectors to a space of a low dimension d.

There are theoretical guarantees thatrandom projection approximately preserves distances.

Training data

We assume that DPP kernel L(X; θ) is parametrized in terms of generic θ that reflect quality and/or diversity properties.

Learning

Learning A conditional DPP is a conditional probabilistic model which assigns a probability to every possible subset . . The model takes the form of an L-ensemble

when L(X) is a positive semidefinite matrix that depends on the input X

Goal: choose parameter θ to maximize the conditional log-likelihood of the training data (so called Maximum Likelihood Estimation, MLE)

with conditional probability of an output Y given input X under parameter θ

Learning

Learning the Parameters of DPP Kernels● It is conjecture that MLE is NP-hard to compute (Kulesza, 2012)

○ non-convex optimization○ non-convexity holds even under various simplified assumptions on the form of L○ approximating the mode of size k of a DPP to within a ck (c>1) factor is known to be NP-hard

● Special cases of quality or similarity functions: ○ Gaussian similarity with uniform quality○ Gaussian similarity with Gaussian quality○ Polynomial similarity with uniform quality

● Nelder-Mead simplex algorithm:does not require explicit knowledge of derivates of a log-likelihood function, there are no theoretical guarantees about convergence to a stationary point.

Learning the parameters of DPP KernelsHeuristics

○ Expectation-Maximization (Gillenwater et al., 2014)○ MCMC (Affandi et al., 2014)○ Fixed point algorithms (Mariet & Sra, 2015)

Extractive Document SummarizationGoal: Learn a DPP to model good summaries Y for a given input X

Kuleza & Taskar, 2012 36

Extractive Document SummarizationSimilarity featuressentences are represented as normalized tf-idf vectors

Similarity function similarity among sentences

Extractive Document SummarizationQuality scores

feature vector for the sentence i

parameter vector

Quality featureslength, position of the sentence in its original document, mean cluster similarity, personal pronouns, LexRank score, …

Extractive Document SummarizationQuality and similarity combined in the form of conditional DPP probability:

Holds is concave in θ

Gradient

Extractive Document Summarization Learning quality parameters by gradient descent

Extractive Document SummarizationInference: for a given X and learned parameter θ find Y with at most b charactersMaximum a posteriori:

Y is submodular and we can approximate it through simple greedy algorithm

Diversified Recommendation ● Content is presented in the form of a feed:

an ordered list of items through the user browses

● The goodness of the recommendation is measured via utility function

● Goal: select and order a set of k items such that the utilityof the set is maximized

Gillenwater et al, 201842

k-DPPA k-DPP on a discrete set is distribution over all subsets with cardinality k.

Normalization constant is where ek is k-th

elementary symmetric polynomial

over eigenvalues of the matrix L.

k-DPPFor N = 3 elementary symmetric polynomials are:

There is an effective recursive algorithm for elementary symmetric polynomials calculations.

k-DPPSampling:

Diversified Recommendation DPP inputs: personalized quality scores and pointwise item distances

Diversified Recommendation Observed interaction of user i with the feed list of length N is given as a binary vector: yu= [0, 1, 0, 1, 1, …, 0]

Goal:maximize the total number of interactions

Slight modification in order to train models from records of previous interactions: maximize the cumulative gain by reranking the feed items

j is the new rank that the model assigns to an item.

Diversified Recommendation The interaction yu = [0, 1, 0, 1, 1, ….0] can be written as Y = {2, 4, 5}

Assumption: Y represents a drawn from the probability distribution defined by a user-specific DPP.

Diversified Recommendation qi - personalized quality scoreDij - Jaccard distance built on video descriptions

ɑ∈ [0, 1) and σ are learning parameters

Run grid search to find the values of the parameters that maximize the cumulative gain.

Diversified Recommendation f - parametrized quality function g - parametrized content re-embedding function ẟ - fixed regularization parameter w - the parameters of L

Diversified Recommendation Inference

greedy algorithm for submodular maximization for size-k window

Runs k iterations adding one video to Y on each iteration:

The greedy algorithm gives the natural order for videos in size-k window.

Than the algorithm is repeated for the unused N-k videos.

DPP classes● Discrete DPP● k-DPP

● Continual DPP● Structured DPP

translation, bioinformatics, ...

● Sequential DPPvideo summarization, …

CodeMatlab:

https://www.alexkulesza.com/code/dpp.tgz

Python:

https://github.com/guilgautier/DPPy

Thank you!

Determinantal Point Processesmachinelearning.math.rs/Zecevic-DPP.pdf · 2019. 10. 23. · Each...

Documents

Transcript of Determinantal Point Processesmachinelearning.math.rs/Zecevic-DPP.pdf · 2019. 10. 23. · Each...

Critical configurations (determinantal loci) for range and range ...

Aalborg Universitet Statistical aspects of determinantal ... · Determinantal point processes (DPPs) are largely unexplored in statistics, though they possess a number of very attractive

Hyperjacobians, determinantal ideal ands weak …olver/e_/hyperj.pdfHyper]acobians, determinantal ideals and weak solutions 319 imagine the general formula for a hyperjacobian, which

Determinantal Point Processes for Memory and Structured Inference · 2020. 7. 23. · Determinantal Point Processes for Memory and Structured Inference ... Figure 1). DPPs originated

Determinantal point processes for machine learning - arXiv · PDF fileDeterminantal point processes for machine learning Alex Kulesza Ben Taskar Abstract Determinantal point processes

Nystr om Approximation for Large-Scale Determinantal Processesstat.wharton.upenn.edu/~rajara/NystromDPP_aistats13.pdf · Nystr om Approximation for Large-Scale Determinantal Processes

Quantum Monte Carlo Methods on Lattices: The Determinantal ... · Quantum Monte Carlo Methods on Lattices: The Determinantal Approach Fakher F. Assaad 1 Institut fur ¤Theoretische

DETERMINANTAL EQUATIONS FOR SECANT VARIETIES AND …jml/BGLeisf.pdf · DETERMINANTAL EQUATIONS FOR SECANT VARIETIES AND THE EISENBUD-KOH-STILLMAN CONJECTURE JAROSLA W BUCZYNSKI, ADAM

An Introduction to Determinantal Point Processes · An Introduction to Determinantal Point Processes ... j2Arestricts Kto those entries in A. We de ne det(K;) ... 12/41. Step 1 of

Learning Determinantal Point Processes in Sublinear Time

Determinantal Point Processes for Machine Learning · Determinantal Point Processes for Machine Learning Alex Kulesza1 and Ben Taskar2 1 University of Michigan, USA, kulesza@umich.edu

Zeros of Gaussian Analytic Functions and Determinantal Point …math.iisc.ac.in/~manju/GAF_book.pdf · Zeros of Gaussian Analytic Functions and Determinantal Point Processes John

Determinantal Ideals, Pfaﬃan Ideals, and the Principal ...lam/amspfaff.pdf · Determinantal Ideals, Pfaﬃan Ideals, and the Principal Minor Theorem Vijay Kodiyalam, T. Y. Lam,

k-DPPs: Fixed-Size Determinantal Point Processes · 2011-06-01 · k-DPPs: Fixed-Size Determinantal Point Processes Figure 1. A set of points in the plane drawn from a DPP (left),

Inference for determinantal point processes without ... · Monte Carlo inference methods for DPPs. 1 Introduction Determinantal point processes (DPPs) are point processes [1] that

Symmetric Determinantal Representation of Polynomialsstaff.math.su.se/shapiro/UIUC/quarez.pdf · Symmetric Determinantal Representation of Polynomials. Linear Algebra and its Applications,

Gröbner Bases of Bihomogeneous Ideals Generated by …€¦ · Determinantal ideals What is known Determinantal ideals: Bernstein/Zelevinsky J. of Alg. Comb. 93, Bruns/Conca 98,

Faster Greedy MAP Inference for Determinantal Point Processeskyoungsoo/papers/dpp.pdf · Faster Greedy MAP Inference for Determinantal Point Processes Insu Han 1Prabhanjan Kambadur2

KRS AND DETERMINANTAL IDEALS - uni-osnabrueck.de · the set of standard bitableaux (or standard monomials) S to the set of monomials M of K[X]. Both S and M are K-bases of K[X]: for

Expectation-Maximization for Learning Determinantal Point ...jgillenw.com/nips2014.pdf · Expectation-Maximization for Learning Determinantal Point Processes Jennifer Gillenwater