"Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar...

21
Business Proprietary & Confidential Quantum Clustering Sigalit Bechler, Data Researcher SimilarWeb & Tel-Aviv university December 1, 2014

Transcript of "Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar...

Page 1: "Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar Web

Business Proprietary & Confidential

Quantum ClusteringSigalit Bechler, Data Researcher

SimilarWeb & Tel-Aviv university

December 1, 2014

Page 2: "Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar Web

Business Proprietary & Confidential

• SimilarWeb – a quick introduction

• Quantum Clustering

December 1, 2014

Agenda

Page 3: "Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar Web

3/31

$65M

Funding

2007Founded 6

Offices300

Employees

SimilarWeb

Some of our clients

Page 4: "Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar Web

What We Do

60M WEBSITES DAILYFOR EVERY WEBSITE:• TRAFFIC ESTIMATION• TRAFFIC SOURCES• AUDIENCE• INDUSTRY• CONTENT

We Provide Digital Insights to the Entire World2M MOBILE APPS DAILYFOR EVERY MOBILE APP:RATINGENGAGEMENTAPP STORE DATACATEGORYKEYWORDS

Page 5: "Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar Web

What We Do

60M WEBSITES DAILYFOR EVERY WEBSITE:• TRAFFIC METRICS• TRAFFIC SOURCES• AUDIENCE• INDUSTRY• CONTENT

2M MOBILE APPS DAILYFOR EVERY MOBILE APP:• RATING• ENGAGEMENT• APP STORE• CATEGORY• KEYWORDS

INGEST:INTERNATIONAL PANEL, CRAWLING, ISP DATA, LEARNING SET

• 90K events/sec• 4TB/day compressed

BATCH & ON DEMAND PROCESSING:

• 100TB i/o a day• > 150 machines just in processing

cluster• Statistical & machine learning

algorithms

We Provide Digital Insights to the Entire World

Page 6: "Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar Web

Business Proprietary & Confidential

Quantum clustering

December 1, 2014

Prof. David Horn and Dr. Assaf Gottlieb.Phys. Rev. Lett. 88 (2002) 018702

Page 7: "Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar Web

• Unsupervised learning problem - dealing with unlabeled data• Goal: group together elements that are similar to each other in some sense.• We usually have an idea or a desire of what this “sense” should be• Might discover new patterns

Clustering - general overview

label feature1 feature2 feature3 feature4 label feature1 feature2 feature3 feature4

Page 8: "Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar Web

• The user identity is unknown• Leaving it in for the example

Clustering - general overview

label feature1 feature2 feature3 feature4 label feature1 feature2 feature3 feature4

?

?

?

?

?

?

?

?

Page 9: "Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar Web

• Grouping by gender

Clustering - general overview

label feature1 feature2 feature3 feature4 label feature1 feature2 feature3 feature4

Page 10: "Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar Web

• Grouping by fields of interest

Clustering- general overview

label feature1 feature2 feature3 feature4 label feature1 feature2 feature3 feature4

Page 11: "Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar Web

Quantum Clustering - Motivation

• Relatively easy clustering task

• Still need to set the number of clusters manually.

• Very complex clustering task. • Unbiased analysis of X-Ray

absorption data

Page 12: "Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar Web

Quantum Clustering - Example

Analyzing Big Data with Dynamic Quantum Clustering M. Weinstein, F. Meirer, A. Hume, Ph. Sciau, G. Shaked, R. Hofstetter, E. Persi, A. Mehta, D. Horn http://arxiv.org/abs/1310.2700

Page 13: "Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar Web

• Information era - big data• Massive collection of data• Strong presence of outliers• Unknown structures• Non trivial patterns

Why is it important?

Quantum Clustering

Distributed computationtechnologies

Page 14: "Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar Web

Quantum clustering - the potential trick1. Turn data-points into Gaussians centered around the data points:

2. Plug into Schrodinger equation and find V(). Define the solution for V as the potential transform

• Single point → Gaussian →• Multi-points: =

3. Move each data point towards the direction of the minima of the according to the potential surface with gradient descent.

Page 15: "Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar Web

Quantum clustering – reasoning

• Why does it make sense?• Models the divergence effects from the cluster center.• V() : The effects that bind points from the same cluster together.• We may say that we are looking for the minima of V() since this is where the

divergence effects are minimal (slow changes – small numerator and high density- denominator:

• SVD may be performed prior to the clustering: X=USVT , perform QC on U or V• Solve the fact that each feature is of a different dimension type, and scale.• enable dimension reduction to those with the highest variance.

Page 16: "Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar Web

A topographic map of the probability distribution for the crab data set with =1/2 using principal components 2 and 3. There exists only one maximum.

A topographic map of the potential for the crab data set with =1/2 using principal components 2 and 3 . The four minima are denoted by crossed circles. The contours are set at values V=cE for c=0.2,…,1.

The Crabs Example (from Ripley’s textbook), 4 classes, 50 samples each, d=5

The data 3D Plot of the potential

Page 17: "Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar Web

Quantum clustering - summary

• Built-in capability to handle outliers (divergence part): no need for additional parameters or processes, no effect on the amount of significant clusters

• The cluster may be a line or other shape and not necessarily a point in the feature space.

• The clusters are not defined by geometric or probability considerations alone

• No need to pre-define the amount of clusters

Page 18: "Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar Web

• Existing approximated quantum clustering variation for improving time complexity.

• Sensitive to small variations in the data density unlike geometry consideration alone.

• Possible Distributed calculation:• Since all we have is to calculate V, V for every data point parts can be calculated at

each point separately in a different machine

• Performed exceptionally in exposing hidden patterns of data structures from a wide range of fields - finance, on-line marketing, experimental physics, speech-recognition, biological data.

Quantum clustering

Page 19: "Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar Web

• Physics may provide interesting perspective to questions that at the first glance has no connection to physics.

• It has been done in scale space theory • Simulated annealing• In bio-informatics for extracting protein structure• And many more

• Next steps: implement in a distributed manner, examine this algorithm on web data, improve time complexity, explore approximated QC, theoretical research.

Quantum clustering

Page 20: "Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar Web

Business Proprietary & Confidential

Thank You!

December 1, 2014

Page 21: "Quantum Clustering - Physics Inspired Clustering Algorithm", Sigalit Bechler, Researcher at Similar Web

Get to know SimilarWeb : https://www.similarweb.com/

References

Prof David Horn Homepage: http://horn.tau.ac.il/