Download - Finding Large Elements in Factorized Tensors

Finding Large Elements in Factorized TensorsMichael E. HOULE1 Hisashi KASHIMA2 Michael NETT1,2

1 National Institute of Informatics, Japan2 The University of Tokyo, Japan

Summary

Low-rank tensor decompositions have becomevaluable tools for many practical problems aris-ing in machine learning and data mining appli-cations concerned with higher-order data. Whilemuch research has focused on specific methodsfor decomposition, only a limited amount of at-tention has been given to the process of locatingelements of interest from within a tensor.Efficient strategies for identifying and extractingtargeted elements are very important in prac-tical applications, which frequently operate onscales at which one cannot afford the time re-quirements associated with brute force methods.We propose several algorithms for solving or ap-proximating the problem of locating the k largestelements of a tensor given only its factoriza-tion. We provide experimental evidence that ourmethods enable follow-on applications in areassuch as link prediction and recommender sys-tems to operate at a significantly larger scale.

Motivation

In virtually all practical applications of high-ordertensors, the number of elements of the tensor isfar larger than the number of objects in the dataset. Generally speaking, in order to be effective,tensor-based applications must take advantageof the sparsity of the representation, and avoidthe expensive brute-force scanning of tensor el-ements. One of the most important issues in thearea of tensor analysis is therefore that of tensorcompletion, where after observing only a limitednumber of element values, one seeks to deduceproperties of the remaining unobserved values.Depending on the application, one usually re-quires knowledge of a relatively small set of ten-sor elements, rather than the entire unobservedportion of the tensor at hand. Tensor completionstrategies have many real world applications in-cluding, but not limited to, personalized tag rec-ommendation and link prediction in social webservices.

Vector, Matrix and Tensor Products

• Let X and Y be tensors of dimensions n1, . . . , np.Their element-wise product X ∗Y is given by

[X ∗Y]i1...ip = [X]i1...ip · [Y]i1...ip.

• Let u(1), . . . , u(p) be vectors in Rn. Their innerproduct is

Φ(u(1), . . . , u(p)) =

n∑i=1

p∏j=1

u(j)i .

• Let u(1), . . . , u(p) be vectors with u(i) ∈ Rni for1 6 i 6 p. Their outer product is the order-ptensor satisfying

[Φ(u(1), . . . , u(p))]i1...ip =

p∏j=1

u(j)ij .

Note that, whenever only two vectors u and v areinvolved, one may conveniently use the simplernotation of Φ(u, v) = u>v and Ψ(u, v) = uv>.

Problem Definition

Let X be a real-valued order-p tensor in Rn1×···×np

and let k ∈ N. The top-k problem is to find aset S of tensor elements whose sum is maximal,subject to |S| = min{k, n1 · · ·np}.

100%94%

82%49%

30%12%

3%16%

100%92%

65%38%

39%9%

14%10%

66%64%

62%41%

18%23%

14%13%

47%49%

40%26%

33%25%

21%30%

30%29%

16%29%

29%37%

28%30%

24%19%

23%17%

22%45%

54%66%

7%4%

17%14%

30%68%

86%94%

10%17%

20%16%

42%75%

86%100%

13%17%

26%12%

39%64%

80%79%

7%20%

15%8%

24%47%

56%57%

100%

100%

86%

68%

39%

27%

18%

16%

100%

89%

84%

71%

45%

29%

28%

9%

66%

69%

70%

42%

38%

47%

24%

26%

47%

48%

32%

46%

44%

40%

45%

33%

30%

33%

27%

28%

45%

43%

41%

47%

24%

21%

22%

44%

52%

75%

67%

71%

7%

18%

26%

49%

57%

84%

83%

87%

10%

30%

34%

37%

63%

90%

95%

96%

13%

17%

27%

35%

64%

83%

83%

89%

7%

26%

19%

26%

45%

59%

72%

64%

100%94%

82%49%

30%12%

3%16%

100%100%

78%59%

47%14%

6%13%

86%76%

83%65%

55%41%

24%24%

68%78%

83%97%

75%54%

37%22%

39%57%

97%97%

88%61%

42%14%

27%64%

80%91%

76%39%

30%25%

18%34%

56%55%

43%29%

18%7%

16%31%

37%21%

37%21%

16%8%

Data Model

The provided data is modeled using a low-rankfactorization with factor matrices U(1), . . . , U(p):

X =

m∑i=1

X(i) =

m∑i=1

Ψ(u(1):i , . . . , u(p)

:i )

The tensor element at location (i1, . . . , ip) is theinner product of the correpsonding rows of thefactor matrices:

[X]i1...ip =

r∑j=1

[Ψ(u(1)

:j , . . . , u(p):j )

]i1...ip

=

r∑j=1

p∏k=1

u(k)ikj

= Φ(u(1)i1: , . . . , u(p)

ip: )

Method I – Enumeration

An exhaustive enumeration of all tensor ele-ments provides exact answers to top-k queriesin time Θ(n1 · · ·np(rp + log k)).

Method II – Greedy Heuristic

An indexing-based method which provides ap-proximate answers to top-k queries in timeO(p2sktr).

Method III – Divide and Conquer

A clustering-based divide and conquer methodwhich aims to reduce the time complexity by dis-carding as many subproblems as possible.

X[2,1,1] X[2,2,1]

X[1,1,1] X[1,2,1]

Individual subproblems can either be solved ex-actly via enumeration, or approximate using thepreviously proposed heuristic.

Importance of Scalability

Projections show that for real application data,exhaustive query processing may require up to31 years.

1

10

100

1000

10000

100000

1e+06

1e+07

1e+08

1e+09

1e+10

Delicious Favorite LastFM Movielens RetweetE

nu

me

ratio

n T

ime

[s]

Data Set

Reduced SetFull Set

The proportion of uninteresting tensor elementsis overwhelmingly large.

1

10

100

1000

10000

100000

1e+06

1e+07

1e+08

1e+09

-1 -0.5 0 0.5 1 1.5 2 2.5

Num

. of O

bserv

ations

Tensor Elements

Distribution of Elements in the LastFM Data Set

Evaluation

Performance of the methods for the LastFM dataset. The input tensor has size 2,100 × 18,744 ×12,647.

200

300

400

500

600

700

800

900

1000

1100

1200

1300

0 100 200 300 400 500 600

Accu

mu

late

d T

op

-10

00

Ele

me

nts

Runtime [s]

Comparison on the full "LastFM" Data Set

EnumDCGreedyDC

Greedy

Additional Material

•Technical Report•Poster• Implementation•Documentation

連連連絡絡絡先先先: Michael E. HOULE (フフフーーールルルマママイイイケケケルルル) / 国国国立立立情情情報報報学学学研研研究究究所所所客客客員員員教教教授授授

� : 03 - 4212 - 2538 t : 03 - 3556 - 1916 B : meh @ nii.ac.jp