Multimedia Communication in the Internet SIP: Advanced Topics
Advanced Multimedia
-
Upload
holmes-soto -
Category
Documents
-
view
24 -
download
2
description
Transcript of Advanced Multimedia
![Page 1: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/1.jpg)
Advanced Multimedia
Text ClusteringTamara Berg
![Page 2: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/2.jpg)
Reminder - Classification
• Given some labeled training documents• Determine the best label for a test (query)
document
![Page 3: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/3.jpg)
What if we don’t have labeled data?
• We can’t do classification.
![Page 4: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/4.jpg)
What if we don’t have labeled data?
• We can’t do classification.• What can we do?
– Clustering - the assignment of objects into groups (called clusters) so that objects from the same cluster are more similar to each other than objects from different clusters.
![Page 5: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/5.jpg)
What if we don’t have labeled data?
• We can’t do classification.• What can we do?
– Clustering - the assignment of objects into groups (called clusters) so that objects from the same cluster are more similar to each other than objects from different clusters.
– Often similarity is assessed according to a distance measure.
![Page 6: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/6.jpg)
What if we don’t have labeled data?
• We can’t do classification.• What can we do?
– Clustering - the assignment of objects into groups (called clusters) so that objects from the same cluster are more similar to each other than objects from different clusters.
– Often similarity is assessed according to a distance measure.
– Clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics.
![Page 7: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/7.jpg)
![Page 8: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/8.jpg)
![Page 9: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/9.jpg)
Any of the similarity metrics we talked about before (SSD, angle between vectors)
![Page 10: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/10.jpg)
Document Clustering
Clustering is the process of grouping a set ofdocuments into clusters of similar documents.
Documents within a cluster should be similar.
Documents from different clusters should bedissimilar.
![Page 11: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/11.jpg)
Source: Hinrich Schutze
![Page 12: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/12.jpg)
Source: Hinrich Schutze
![Page 13: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/13.jpg)
Source: Hinrich Schutze
![Page 14: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/14.jpg)
Source: Hinrich Schutze
![Page 15: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/15.jpg)
Source: Hinrich Schutze
![Page 16: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/16.jpg)
Source: Hinrich Schutze
Google newsFlickr Clusters
![Page 17: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/17.jpg)
Source: Hinrich Schutze
![Page 18: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/18.jpg)
How to cluster Documents
![Page 19: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/19.jpg)
Reminder - Vector Space Model
Documents are represented as vectors in term space
A vector distance/similarity measure between two documents is used to compare documents
Slide from Mitch Marcus
![Page 20: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/20.jpg)
Document Vectors:One location for each word.
nova galaxy heat h’wood film rolediet fur
10 5 3
5 10 10 8 7 9 10 5
10 10 9 10
5 7 9 6 10 2 8
7 5 1 3
ABCDEFGHI
“Nova” occurs 10 times in text A“Galaxy” occurs 5 times in text A“Heat” occurs 3 times in text A(Blank means 0 occurrences.)
Slide from Mitch Marcus
![Page 21: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/21.jpg)
Document Vectors
nova galaxy heat h’wood film rolediet fur
10 5 3
5 10 10 8 7
9 10 5 10 10 9 10
5 7 9 6 10 2 8
7 5 1 3
ABCDEFGHI
Document ids
Slide from Mitch Marcus
![Page 22: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/22.jpg)
TF x IDF Calculation
)/log(* kikik nNtfw
€
Tk = term k in document Ditf ik = frequency of term Tk in document Diidfk = inverse document frequency of term Tk in CN = total number of documents in the collection Cnk = the number of documents in C that contain Tk
idfk = log Nkn( )
Slide from Mitch Marcus
W1 W2 W3 … WnA
![Page 23: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/23.jpg)
Features
F1 F2 F3 … FnA
Define whatever features you like:Length of longest string of CAP’sNumber of $’sUseful words for the task…
![Page 24: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/24.jpg)
Similarity between documents
A = [10 5 3 0 0 0 0 0];G = [5 0 7 0 0 9 0 0];E = [0 0 0 0 0 10 10 0];
Sum of Squared Distances (SSD) =
SSD(A,G) = ?SSD(A,E) = ?SSD(G,E) = ?Which pair of documents are the most similar?
€
(X ii=1
n
∑ −Yi)2
![Page 25: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/25.jpg)
Source: Hinrich Schutze
![Page 26: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/26.jpg)
source: Dan Klein
![Page 27: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/27.jpg)
K-means clustering
• Want to minimize sum of squared Euclidean distances between points xi and their nearest cluster centers mk
k
ki
ki mxMXDcluster
clusterinpoint
2)(),(
source: Svetlana Lazebnik
![Page 28: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/28.jpg)
K-means clustering
• Want to minimize sum of squared Euclidean distances between points xi and their nearest cluster centers mk
k
ki
ki mxMXDcluster
clusterinpoint
2)(),(
source: Svetlana Lazebnik
![Page 29: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/29.jpg)
![Page 30: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/30.jpg)
![Page 31: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/31.jpg)
![Page 32: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/32.jpg)
![Page 33: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/33.jpg)
![Page 34: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/34.jpg)
![Page 35: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/35.jpg)
![Page 36: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/36.jpg)
![Page 37: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/37.jpg)
![Page 38: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/38.jpg)
![Page 39: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/39.jpg)
![Page 40: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/40.jpg)
source: Dan Klein
![Page 41: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/41.jpg)
source: Dan Klein
![Page 42: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/42.jpg)
source: Dan Klein
![Page 43: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/43.jpg)
Convergence of K Means
• K-means converges to a fixed point in a finite number of iterations.
Proof:
Source: Hinrich Schutze
![Page 44: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/44.jpg)
Convergence of K Means
• K-means converges to a fixed point in a finite number of iterations.
Proof:• The sum of squared distances (RSS) decreases during
reassignment.
Source: Hinrich Schutze
![Page 45: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/45.jpg)
Convergence of K Means
• K-means converges to a fixed point in a finite number of iterations.
Proof:• The sum of squared distances (RSS) decreases during
reassignment.• (because each vector is moved to a closer centroid)
Source: Hinrich Schutze
![Page 46: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/46.jpg)
Convergence of K Means
• K-means converges to a fixed point in a finite number of iterations.
Proof:• The sum of squared distances (RSS) decreases during
reassignment.• (because each vector is moved to a closer centroid)• RSS decreases during recomputation.
Source: Hinrich Schutze
![Page 47: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/47.jpg)
Convergence of K Means
• K-means converges to a fixed point in a finite number of iterations.
Proof:• The sum of squared distances (RSS) decreases during
reassignment.• (because each vector is moved to a closer centroid)• RSS decreases during recomputation.• Thus: We must reach a fixed point.
Source: Hinrich Schutze
![Page 48: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/48.jpg)
Convergence of K Means
• K-means converges to a fixed point in a finite number of iterations.
Proof:• The sum of squared distances (RSS) decreases during
reassignment.• (because each vector is moved to a closer centroid)• RSS decreases during recomputation.• Thus: We must reach a fixed point.• But we don’t know how long convergence will take!
Source: Hinrich Schutze
![Page 49: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/49.jpg)
Convergence of K Means
• K-means converges to a fixed point in a finite number of iterations.
Proof:• The sum of squared distances (RSS) decreases during
reassignment.• (because each vector is moved to a closer centroid)• RSS decreases during recomputation.• Thus: We must reach a fixed point.• But we don’t know how long convergence will take!• If we don’t care about a few docs switching back and forth,
then convergence is usually fast (< 10-20 iterations).
Source: Hinrich Schutze
![Page 50: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/50.jpg)
source: Dan Klein
![Page 51: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/51.jpg)
source: Dan Klein
![Page 52: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/52.jpg)
Source: Hinrich Schutze
![Page 53: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/53.jpg)
Source: Hinrich Schutze
![Page 54: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/54.jpg)
Hierarchical clustering strategies
• Agglomerative clustering• Start with each point in a separate cluster• At each iteration, merge two of the “closest” clusters
• Divisive clustering• Start with all points grouped into a single cluster• At each iteration, split the “largest” cluster
source: Svetlana Lazebnik
![Page 55: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/55.jpg)
source: Dan Klein
![Page 56: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/56.jpg)
source: Dan Klein
![Page 57: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/57.jpg)
Divisive Clustering
• Top-down (instead of bottom-up as in Agglomerative Clustering)
• Start with all docs in one big cluster• Then recursively split clusters• Eventually each node forms a cluster on its
own.
Source: Hinrich Schutze
![Page 58: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/58.jpg)
Flat or hierarchical clustering?
• For high efficiency, use flat clustering (e.g. k means)
• For deterministic results: hierarchical clustering• When a hierarchical structure is desired:
hierarchical algorithm• Hierarchical clustering can also be applied if K
cannot be predetermined (can start without knowing K)
Source: Hinrich Schutze
![Page 59: Advanced Multimedia](https://reader036.fdocuments.us/reader036/viewer/2022062400/568131d9550346895d984038/html5/thumbnails/59.jpg)
For Thurs
• Read Chapter 6 of textbook