Low computational cost algorithms for photo clustering and mail signature detection in the cloud
-
Upload
xavier-giro -
Category
Technology
-
view
161 -
download
2
Transcript of Low computational cost algorithms for photo clustering and mail signature detection in the cloud
![Page 1: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/1.jpg)
Low computational cost algorithms for photo clustering and mail
signature detection in the cloud!
Daniel Manchón Co-directors: Xavi Giró (UPC) Omar Pera (Pixable)
1
![Page 2: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/2.jpg)
Outline• Motivation!
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering
• Mail signature detection
• Conclusions
• Introduction
• Requirements
• Design
• Results
2
![Page 3: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/3.jpg)
Motivation: Photo clustering
3
Low computational cost algorithms for photo clustering and mail signature detection in the cloud
![Page 4: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/4.jpg)
Motivation: Mail signature detection
4
Low computational cost algorithms for photo clustering and mail signature detection in the cloud
![Page 5: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/5.jpg)
Motivation: Cloud computing
5
Low computational cost algorithms for photo clustering and mail signature detection in the cloud
![Page 6: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/6.jpg)
Outline• Motivation
• Tasks summary
• Pixable internship!
• GPI research assistant
• Photo clustering
• Mail signature detection
• Conclusions
• Introduction
• Requirements
• Design
• Results
6
![Page 7: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/7.jpg)
Pixable internship
- Social photos aggregation!- Photo ranking!- Editorial content!- Contacts feeds!- Owned by Singtel
- Photo storage!- Synchronization across multiple devices!- Support for RAW
- CallerID application!- Multiple contact source support!- Contact backup and synchronization!- SPAM detection
7
![Page 8: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/8.jpg)
Photofeed tasks• Instagram source (in-production)
• Referrals and invitations method
• "New relic" integration
• Photo clustering and summarization
• Photo download service (in-production)
8
![Page 9: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/9.jpg)
• Mail scrapping monitorization
• Signature detection!
• Identity analysis improvement
• Tooling (in-production)
Contactive tasks
9
![Page 10: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/10.jpg)
Outline• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant!
• Photo clustering
• Mail signature detection
• Conclusions
• Introduction
• Requirements
• Design
• Results
10
![Page 11: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/11.jpg)
GPI research assistant• Mediaeval 2013 (published paper)
• ICMR SEWM (published paper)
• Pyxel software framework
• Mediaeval 2014
11
Multimedia retrieval conference
GPI: Image and Video Processing Group
![Page 12: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/12.jpg)
Outline• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering!
• Mail signature detection
• Conclusions
• Introduction!
• Requirements
• Design
• Results
12
![Page 13: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/13.jpg)
Photo Clustering: Intro
PhotoTOC [Platt et al, PACRIM 2003]
State of the artEvent detection
13
![Page 14: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/14.jpg)
Outline• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering!
• Mail signature detection
• Conclusions
• Introduction
• Requirements!
• Design
• Results
14
![Page 15: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/15.jpg)
Photo Clustering: Requirements
• User data stored in Amazon cloud and MongoDB.
• Low computing
• Easily configurable using REST API
• Event generation
• Visual and metadata information available
• F1 and NMI as evaluation metrics
• 400k annotated photo dataset
Mediaeval requirements Photofeed constrains
15
![Page 16: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/16.jpg)
Outline• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering!
• Mail signature detection
• Conclusions
• Introduction
• Requirements
• Design!
• Results
16
![Page 17: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/17.jpg)
Design
Hi, I’m John. Hi, I’m Emily.
(a) Temporal sorting by each user independently
17
![Page 18: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/18.jpg)
Design
(b) Temporal-based oversegmentation in mini-clusters
PhotoTOC [Platt et al, PacRim 2003]
18
![Page 19: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/19.jpg)
Design(b) Temporal-based oversegmentation in mini-clusters, mean values modelization
19
Username= John T.taken= 2010-09-10 02:10:12 GPS= (42.1,-10) tags= live,stage,deerhunter
Username= emily T.taken= 2010-12-13 02:11:10 GPS= (43,-8.40) tags= live,deerhunter
Username= emily T.taken= 2010-12-13 03:11:10 GPS= (no data) tags= live,stones
Username= emily T.taken= 2010-12-14 23:11:10 GPS= (43.2,-8.2) tags= sound, test
![Page 20: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/20.jpg)
Design
(c) Sequential merging of mini-clusters
?t
avg(·) avg(·) avg(·)avg(·)
20
![Page 21: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/21.jpg)
Design
(c) Sequential merging of mini-clusters
21
![Page 22: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/22.jpg)
Outline• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering!
• Mail signature detection
• Conclusions
• Introduction
• Requirements
• Design
• Results
22
![Page 23: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/23.jpg)
Results
e
x
c
x
R
x
=|c
x
\ e
x
||e
x
|
P
x
R
x
Precision(P ) Recall(R)
F1
F1 = 2PR
P +R
UPC 3rd place of 12 teams!!!
23
![Page 24: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/24.jpg)
Outline• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering
• Mail signature detection!
• Conclusions
• Introduction!
• Requirements
• Design
• Results
24
![Page 25: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/25.jpg)
Mail signature detection: Intro
• Email information extraction
• SPAM detection
• Low computation
State of the artKEY TOPICS
Learning to extract signature and reply lines from email [Vitor R. Carvalho and William W. Cohen, 2004 ]
25
![Page 26: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/26.jpg)
Outline• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering
• Mail signature detection!
• Conclusions
• Introduction
• Requirements!
• Design
• Results
26
![Page 27: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/27.jpg)
Mail signature detection: Requirements• Mail scrapping service improvement
• Pre-process the input to reduce the execution time
• Adapt the mail scrapping service to Contactive product
?fewer information
filter only signatures
MongoDB entries
User mailbox
id 89012name John Doeemail [email protected] Id 7788455367_ephone 789675463
27
Mail scrapping
service
![Page 28: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/28.jpg)
Outline• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering
• Mail signature detection!
• Conclusions
• Introduction
• Requirements
• Design!
• Results
28
![Page 29: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/29.jpg)
Design2. Problem Definition and Corpus
A signature block is the set of lines, usually in the end of a message, that contain information about the sender,
such as personal name, affiliation, postal address, web address, email address, telephone number, etc. Quotes from famous persons and creative ASCII drawings are often present in this block also. An example of a signature block can be seen in last six lines of the email message pictured in Figure 1 (marked with the line label <sig>). Figure 1 also contains six lines of text that were quoted from a preceding message (marked with the line label <reply>). In this paper we will call such lines reply lines.
<other> From: [email protected] <other> To: Vitor Carvalho <[email protected]> <other> Subject: Re: Did you try to compile javadoc recently? <other> Date: 25 Mar 2004 12:05:51 -0500 <other> <other> Try cvs update –dP, this removes files & directories that have been <other> deleted from cvs. <other> - W <other> <reply> On Wed, 2004-03-24 at 19:58, Vitor Carvalho wrote: <reply> > I’ve just checked-out the baseline m3 code and <reply> > "Ant dist" is working fine, but "ant javadoc" is not. <reply> > Thanks <reply> > Vitor <other> <sig> ------------------------------------------------------------------ <sig> William W. Cohen “Would you drive a mime <sig> [email protected] nuts if you played a <sig> http://www.wcohen.com blank audio tape <sig> Associate Research Professor full blast?” <sig> CALD, Carnegie-Mellon University - S. Wright
Figure 1 - Excerpt from a labeled email message
Below we first consider the task of detecting signature blocks—that is, classifying messages as to whether or
not they contain a signature block. We next consider signature line extraction. This is the task of classifying lines within a message as to whether or not they belong to a signature block. In our experiments, we perform signature line extraction only on messages which are known to contain a signature block.
To obtain a corpus of messages for signature block detection, we began with messages from the 20 Newsgroups dataset (Lang, 1995). We began by separating the messages into two groups P and N, using the following heuristic. We first looked for pairs of messages from the same sender and whose last T lines were identical. If T was larger than or equal to 6, then one of the messages from this sender (randomly chosen) was placed in group P (which contains messages likely to have a signature block). If T was less than or equal to 1, a sample message from this sender was placed in group N. These groups were supplemented with messages from our personal inboxes (to provide a sample of more recent emails) and manually checked for correctness. This resulted in a final set of 617 messages (all from different senders) containing a signature block, and a set of 586 messages not having a signature block.
For the extraction experiments, the 617-message dataset was manually annotated for signature lines. It was also annotated for reply lines (as in Figure 1). As noted above, the identification of reply lines can be helpful in tasks such as email threading, and certain types of content-based message classification; and as we will demonstrate below, our signature line extraction techniques can also be successfully applied to identifying reply lines. The final dataset has 33,013 lines. Of these, 3,321 lines are in signature blocks, and 5,587 are reply lines.
(a) Split the K last mail lines and retrieve the annotations
Last K lines
Ground truth annotations
29
![Page 30: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/30.jpg)
2. Problem Definition and Corpus A signature block is the set of lines, usually in the end of a message, that contain information about the sender,
such as personal name, affiliation, postal address, web address, email address, telephone number, etc. Quotes from famous persons and creative ASCII drawings are often present in this block also. An example of a signature block can be seen in last six lines of the email message pictured in Figure 1 (marked with the line label <sig>). Figure 1 also contains six lines of text that were quoted from a preceding message (marked with the line label <reply>). In this paper we will call such lines reply lines.
<other> From: [email protected] <other> To: Vitor Carvalho <[email protected]> <other> Subject: Re: Did you try to compile javadoc recently? <other> Date: 25 Mar 2004 12:05:51 -0500 <other> <other> Try cvs update –dP, this removes files & directories that have been <other> deleted from cvs. <other> - W <other> <reply> On Wed, 2004-03-24 at 19:58, Vitor Carvalho wrote: <reply> > I’ve just checked-out the baseline m3 code and <reply> > "Ant dist" is working fine, but "ant javadoc" is not. <reply> > Thanks <reply> > Vitor <other> <sig> ------------------------------------------------------------------ <sig> William W. Cohen “Would you drive a mime <sig> [email protected] nuts if you played a <sig> http://www.wcohen.com blank audio tape <sig> Associate Research Professor full blast?” <sig> CALD, Carnegie-Mellon University - S. Wright
Figure 1 - Excerpt from a labeled email message
Below we first consider the task of detecting signature blocks—that is, classifying messages as to whether or
not they contain a signature block. We next consider signature line extraction. This is the task of classifying lines within a message as to whether or not they belong to a signature block. In our experiments, we perform signature line extraction only on messages which are known to contain a signature block.
To obtain a corpus of messages for signature block detection, we began with messages from the 20 Newsgroups dataset (Lang, 1995). We began by separating the messages into two groups P and N, using the following heuristic. We first looked for pairs of messages from the same sender and whose last T lines were identical. If T was larger than or equal to 6, then one of the messages from this sender (randomly chosen) was placed in group P (which contains messages likely to have a signature block). If T was less than or equal to 1, a sample message from this sender was placed in group N. These groups were supplemented with messages from our personal inboxes (to provide a sample of more recent emails) and manually checked for correctness. This resulted in a final set of 617 messages (all from different senders) containing a signature block, and a set of 586 messages not having a signature block.
For the extraction experiments, the 617-message dataset was manually annotated for signature lines. It was also annotated for reply lines (as in Figure 1). As noted above, the identification of reply lines can be helpful in tasks such as email threading, and certain types of content-based message classification; and as we will demonstrate below, our signature line extraction techniques can also be successfully applied to identifying reply lines. The final dataset has 33,013 lines. Of these, 3,321 lines are in signature blocks, and 5,587 are reply lines.
Lines
N Feature Patterns
(b) feature extraction
30
Design
![Page 31: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/31.jpg)
Design(c) SVM training and model generation
2. Problem Definition and Corpus A signature block is the set of lines, usually in the end of a message, that contain information about the sender,
such as personal name, affiliation, postal address, web address, email address, telephone number, etc. Quotes from famous persons and creative ASCII drawings are often present in this block also. An example of a signature block can be seen in last six lines of the email message pictured in Figure 1 (marked with the line label <sig>). Figure 1 also contains six lines of text that were quoted from a preceding message (marked with the line label <reply>). In this paper we will call such lines reply lines.
<other> From: [email protected] <other> To: Vitor Carvalho <[email protected]> <other> Subject: Re: Did you try to compile javadoc recently? <other> Date: 25 Mar 2004 12:05:51 -0500 <other> <other> Try cvs update –dP, this removes files & directories that have been <other> deleted from cvs. <other> - W <other> <reply> On Wed, 2004-03-24 at 19:58, Vitor Carvalho wrote: <reply> > I’ve just checked-out the baseline m3 code and <reply> > "Ant dist" is working fine, but "ant javadoc" is not. <reply> > Thanks <reply> > Vitor <other> <sig> ------------------------------------------------------------------ <sig> William W. Cohen “Would you drive a mime <sig> [email protected] nuts if you played a <sig> http://www.wcohen.com blank audio tape <sig> Associate Research Professor full blast?” <sig> CALD, Carnegie-Mellon University - S. Wright
Figure 1 - Excerpt from a labeled email message
Below we first consider the task of detecting signature blocks—that is, classifying messages as to whether or
not they contain a signature block. We next consider signature line extraction. This is the task of classifying lines within a message as to whether or not they belong to a signature block. In our experiments, we perform signature line extraction only on messages which are known to contain a signature block.
To obtain a corpus of messages for signature block detection, we began with messages from the 20 Newsgroups dataset (Lang, 1995). We began by separating the messages into two groups P and N, using the following heuristic. We first looked for pairs of messages from the same sender and whose last T lines were identical. If T was larger than or equal to 6, then one of the messages from this sender (randomly chosen) was placed in group P (which contains messages likely to have a signature block). If T was less than or equal to 1, a sample message from this sender was placed in group N. These groups were supplemented with messages from our personal inboxes (to provide a sample of more recent emails) and manually checked for correctness. This resulted in a final set of 617 messages (all from different senders) containing a signature block, and a set of 586 messages not having a signature block.
For the extraction experiments, the 617-message dataset was manually annotated for signature lines. It was also annotated for reply lines (as in Figure 1). As noted above, the identification of reply lines can be helpful in tasks such as email threading, and certain types of content-based message classification; and as we will demonstrate below, our signature line extraction techniques can also be successfully applied to identifying reply lines. The final dataset has 33,013 lines. Of these, 3,321 lines are in signature blocks, and 5,587 are reply lines.
31
Feature matrix [KxN]
Vector ground truth [K]
+ SVM training Model=
![Page 32: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/32.jpg)
Design(c) SVM training and model generation
Model● Other ● Reply ● Signature
Lines
Classes
pre-process
Features
32
![Page 33: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/33.jpg)
Outline• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering
• Mail signature detection!
• Conclusions
• Introduction
• Requirements
• Design
• Results
33
![Page 34: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/34.jpg)
Results
Precision =TP
TP + FP
Recall =TP
TP + FN
F1 = 2Precision ·Recall
Precision+Recall
34
With annotated dataset Without annotated dataset
Manual evaluation
Contactive user base mailboxes
![Page 35: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/35.jpg)
Outline• Motivation
• Tasks summary
• Pixable internship
• GPI research assistant
• Photo clustering
• Mail signature detection
• Conclusions
• Introduction
• Requirements
• Design
• Results
35
![Page 36: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/36.jpg)
Conclusions• Academic
• Papers: Mediaeval 2013 and ICMR SEWM, and Mediaeval 2014 on preparation.
• UPC Pyxel framework foundations
• Industrial
• Contributions to Pixable in production servers:
• Instagram integration
• Photofeed Downloader
• Mail signature detection: Proof of concept successful.
• Work in the USA!36
![Page 37: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/37.jpg)
Thank you very much!!Q&A
37
![Page 38: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/38.jpg)
BACKUP SLIDES
38
![Page 39: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/39.jpg)
Design
39
(c) Sequential merging of mini-clusters
Weighted modalities
● creation (or upload) time ● geolocation ● textual labels ● same user
![Page 40: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/40.jpg)
Design
40
(c) Sequential merging of mini-clusters
Geolocation (d=haversine)Time stamp (d=L1)
Text labels (d=Jaccard) Same user (d=boolean)
![Page 41: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/41.jpg)
Design
41
(c) Sequential merging of mini-clusters
![Page 42: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/42.jpg)
Design
42
(c) Sequential merging of mini-clusters
42
Mean and std. deviation learned on pairs of photos within
the same training event.
![Page 43: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/43.jpg)
Design
43
(c) Sequential merging of mini-clusters
43
phi function
![Page 44: Low computational cost algorithms for photo clustering and mail signature detection in the cloud](https://reader035.fdocuments.us/reader035/viewer/2022081404/559629301a28ab905a8b47fe/html5/thumbnails/44.jpg)
Design
44
(c) Sequential merging of mini-clusters
decision threhold