15 - Phylogeny

download 15 - Phylogeny

of 171

Transcript of 15 - Phylogeny

  • 8/16/2019 15 - Phylogeny

    1/171

    Prof. Dr. Anderson [email protected]

    http://www.ic.unicamp.br/~rocha

    Reasoning for Complex Data (RECOD) Lab.Institute of Computing, Unicamp

    Av. Albert Einstein, 1251 - Cidade Universitária

    CEP 13083-970 • Campinas/SP - Brasil

    http://www.ic.unicamp.br/~rochahttp://www.ic.unicamp.br/~rochamailto:[email protected]:[email protected]

  • 8/16/2019 15 - Phylogeny

    2/171

    A. Rocha, 2012 – Análise Forense de Documentos Digitais 4

    Summary

    ! Phylogeny Problem

    ! Image Phylogeny

    ! Video Phylogeny

    ! Current Challenges

  • 8/16/2019 15 - Phylogeny

    3/171

    Phylogeny of Media

    Parts of this class were presented in talk in Feb. 2nd, 2012 at Google, US,

    by S. Goldenstein

     Joint work – S. Goldenstein, Z. Dias and A. Rocha

  • 8/16/2019 15 - Phylogeny

    4/171

    What is this talk about?

    4

  • 8/16/2019 15 - Phylogeny

    5/171

    What is this talk about?

    • An important problem that has beenmostly overlooked by the community.

    4

  • 8/16/2019 15 - Phylogeny

    6/171

    What is this talk about?

    • An important problem that has beenmostly overlooked by the community.

    • Has immediate applications in many areas.

    4

  • 8/16/2019 15 - Phylogeny

    7/171

    What is this talk about?

    • An important problem that has beenmostly overlooked by the community.

    • Has immediate applications in many areas.• It is hard to solve.

    4

  • 8/16/2019 15 - Phylogeny

    8/171

    What is this talk about?

    • An important problem that has beenmostly overlooked by the community.

    • Has immediate applications in many areas.• It is hard to solve.

    • There is room for elegant math anddifferent solutions.

    4

  • 8/16/2019 15 - Phylogeny

    9/171

    What is this talk about?

    • An important problem that has beenmostly overlooked by the community.

    • Has immediate applications in many areas.• It is hard to solve.

    • There is room for elegant math anddifferent solutions.

    Video Phylogeny: Recovering Near-Duplicate Video RelationshipsZ Dias, A Rocha, and S Goldenstein

    IEEE Workshop on Information Forensics and

    Security (WIFS) 2011

    Image Phylogeny by MinimalSpanning Trees Z Dias, and A Rocha and S Goldenstein

    IEEE Transactions of Information Forensics and

    Security, April 2012.4

  • 8/16/2019 15 - Phylogeny

    10/171

    How it started

    In 2009, the current Brazilian president was the

    president’s chief of staff, and the government pre-

    candidate for the 2010 presidential election.Folha de SP , a major Brazilian newspaper (think New

    York Times) ran an interview and article about her.

    They printed a “scan of her criminal records” as

    political activist in the military dictatorship period

    (1964-1985), suggesting it as a record she engaged in

    violent armed activities (which she denies to this day).

    5

  • 8/16/2019 15 - Phylogeny

    11/171

    How it started

    6

  • 8/16/2019 15 - Phylogeny

    12/171

    How it started

    6

    In 2009, the current Brazilian president was just apre-candidate (for the 2010 presidential election).

  • 8/16/2019 15 - Phylogeny

    13/171

    How it started

    6

    In 2009, the current Brazilian president was just apre-candidate (for the 2010 presidential election).

    The newspaper A Folha de SP  (think New York Times)ran article about her.

  • 8/16/2019 15 - Phylogeny

    14/171

    How it started

    6

    In 2009, the current Brazilian president was just apre-candidate (for the 2010 presidential election).

    The newspaper A Folha de SP  (think New York Times)ran article about her.

    It had a “scan of her criminal files” as political activistin the military dictatorship period (1964-1985).

  • 8/16/2019 15 - Phylogeny

    15/171

    7

  • 8/16/2019 15 - Phylogeny

    16/171

    Criminal

    Records?

    A “scan” of her personal

    files maintained by the

    military internal security

    during the Brazilian

    military regime.

    The Public Archive of SP

    actually hosts such a

    collection. 8

  • 8/16/2019 15 - Phylogeny

    17/171

    Searching the Web...

    9

  • 8/16/2019 15 - Phylogeny

    18/171

    Searching the Web...

    10

  • 8/16/2019 15 - Phylogeny

    19/171

    Criminal Records?

    11

  • 8/16/2019 15 - Phylogeny

    20/171

    Criminal Records?

    • This image was already going around thenet for " six months ! it is a clear fake.

    11

  • 8/16/2019 15 - Phylogeny

    21/171

    Criminal Records?

    • This image was already going around thenet for " six months ! it is a clear fake.

    • She hired us, as consultants, to provide aforensic analysis of its authenticity thatcould hold on court.

    11

  • 8/16/2019 15 - Phylogeny

    22/171

    Criminal Records?

    • This image was already going around thenet for " six months ! it is a clear fake.

    • She hired us, as consultants, to provide aforensic analysis of its authenticity thatcould hold on court.

    •There were several versions of the image

    (near duplicates)

    # Which one was the original?

    # Where should we perform the analysis?

    11

  • 8/16/2019 15 - Phylogeny

    23/171

    How to find the Original?

    • The images are “copied” around...- resized;

    - cropped;

    - color corrected;

    - recompressed;

    - and possibly other transformations.

    12

  • 8/16/2019 15 - Phylogeny

    24/171

    Media Phylogeny

    • Identify, among a set of near duplications,which element is the original, and thestructure of generation of each near

    duplication.

    • Tells the history of the transformationscreated the duplications.

    13

  • 8/16/2019 15 - Phylogeny

    25/171

    Media Phylogeny

    D

    D1   D2

    DDB2

    D3

    D4   D5

    D6

    Watermarking & Fingerprinting Enabled 

    Content-Based Copy 

    Detection & Recognition

    Referenced copy 

    DDB

    DDB1

       T   i  (    D

       )   T       j      (   D   

     )  

       T   k  (    D

       1   )   T    

    l     (   D   2    )  

       T  r  (   D

       3   )   T    

    t     (   D   3    )  

       T   i  (    D

       5   )

       T a  (    D

      D  B   ) T    

    b  

    (   D   D  B    )  

    14

  • 8/16/2019 15 - Phylogeny

    26/171

    Image Phylogeny TreesIPT

  • 8/16/2019 15 - Phylogeny

    27/171

    Image Phylogeny Trees:

    IPT• Security .• Forensics.

    • Copyright enforcement.• News tracking services.

    • Indexing.

    16

  • 8/16/2019 15 - Phylogeny

    28/171

    Image Phylogeny Trees:

    IPT• Security : the modification graph providesinformation of suspects’ behavior, and points outflow of content distribution.

    • Forensics.

    • Copyright enforcement.

    •News tracking services.

    • Indexing.

    17

  • 8/16/2019 15 - Phylogeny

    29/171

    Image Phylogeny Trees:

    IPT• Security .• Forensics: analysis in the original document (root

    of the tree) instead of in a near duplicate.

    • Copyright enforcement.

    • News tracking services.

    • Indexing.

    18

  • 8/16/2019 15 - Phylogeny

    30/171

    Image Phylogeny Trees:

    IPT• Security .• Forensics.

    • Copyright enforcement: traitor tracing withoutthe need of source control techniques(watermarking or fingerprinting).

    •News tracking services.

    • Indexing.

    19

  • 8/16/2019 15 - Phylogeny

    31/171

    Image Phylogeny Trees:

    IPT• Security .• Forensics.

    • Copyright enforcement.• News tracking services: the ND relationships

    can feed news tracking services with key elementsfor determining the opinion forming process acrosstime and space.

    • Indexing.

    20

  • 8/16/2019 15 - Phylogeny

    32/171

    Image Phylogeny Trees:

    IPT• Security .• Forensics.

    • Copyright enforcement.• News tracking services.

    • Indexing: tree root can give us an image from anND set as a representative to index, store, or evenfurther refine the ND search.Tree structure might help indexing and retrieving.

    21

  • 8/16/2019 15 - Phylogeny

    33/171

    A CB

    D E

    F G

    H

    Image Near

    A

    BC D

    G

    H

    E

    F

    Phylogeny

    T ~ β(A)

    T ~ β(A)   ~ β(A)

    T ~ β(A)

    T ~ β(B)

    T ~ β(C )

    T ~ β(D)

    Our Objective

  • 8/16/2019 15 - Phylogeny

    34/171

    Two Subproblems

    1. Define good dissimilarity functions

    between images.

    2. Develop algorithms that construct the

    Image Phylogeny Tree given a dissimilaritymatrix of the images.

    23

    d(i, j)

  • 8/16/2019 15 - Phylogeny

    35/171

    Dissimilarity

  • 8/16/2019 15 - Phylogeny

    36/171

    DissimilarityThe dissimilarity is not a metric - we want to

    estimate how likely A#B and B#A.

    Think about cropping, or resizing an image -

    these are not two-way operations.

    Jeffrey Pine, Sentinel Dome, Yosemite National Park, Ansel Adams. 25

  • 8/16/2019 15 - Phylogeny

    37/171

    Dissimilarity

    • Define a family of image transformations  parameterized by

    • Letfind that minimizes

     

    dβ(i, j) = |I j − T β(I i)|2

    min   dβ(i, j)

    d(i, j) = dβmin(i, j)

    T β(I )   β .

    26

  • 8/16/2019 15 - Phylogeny

    38/171

    Dissimilarity

    • We use a composition of three simple steps.

    • In the general case, finding the optimumparameters of a general transformationmight be a complicated optimization.

    27

    T β(I ) =  T jpeg(T color(T spatial(I ))

  • 8/16/2019 15 - Phylogeny

    39/171

    Dissimilarity

    28

  • 8/16/2019 15 - Phylogeny

    40/171

    Dissimilarity

    • Spatial- Affine Transformation

    - Cropping

    28

  • 8/16/2019 15 - Phylogeny

    41/171

    Dissimilarity

    • Spatial- Affine Transformation

    - Cropping

    • Color- Channel Brightness and Contrast.

    28

  • 8/16/2019 15 - Phylogeny

    42/171

    Dissimilarity

    • Spatial- Affine Transformation

    - Cropping

    • Color- Channel Brightness and Contrast.

    •  JPEG Compression- Quantization tables.

    28

  • 8/16/2019 15 - Phylogeny

    43/171

    Spatial Transformation

    for the Dissimilarit

    Afghanistan-Pakistan border, Steve McCurry. 29

  • 8/16/2019 15 - Phylogeny

    44/171

    Spatial Transformation

    for the DissimilaritKey Points.

    Afghanistan-Pakistan border, Steve McCurry. 29

  • 8/16/2019 15 - Phylogeny

    45/171

    Spatial Transformation

    for the DissimilaritKey Points.

    Afghanistan-Pakistan border, Steve McCurry. 29

  • 8/16/2019 15 - Phylogeny

    46/171

    Spatial Transformation

    for the Dissimilarit

    Correspondences.

    Key Points.

    Afghanistan-Pakistan border, Steve McCurry. 29

  • 8/16/2019 15 - Phylogeny

    47/171

    Spatial Transformation

    for the Dissimilarit

    Correspondences.

    Robust Estimationof Affine Transf.

    Key Points.

    x0

    y0

    =

    a b

    c d

    x

    y

    +

    txty

    Afghanistan-Pakistan border, Steve McCurry. 29

    C l T f

  • 8/16/2019 15 - Phylogeny

    48/171

    Color Transformation

    for the Dissimilarit

    V-J Day, Times Square, Alfred Eisenstaedt 30

    C l T f

  • 8/16/2019 15 - Phylogeny

    49/171

    Color Transformation

    for the Dissimilarit

    31

    C l T f

  • 8/16/2019 15 - Phylogeny

    50/171

    Color Transformation

    for the Dissimilarit

    Brightness / Contrast ~ Same mean and stddev.

    32

    D l

  • 8/16/2019 15 - Phylogeny

    51/171

    Dissimilarity:

    Com ression

    • Use the quantization table of the jpeg of Bto compress T(A).

    33

  • 8/16/2019 15 - Phylogeny

    52/171

  • 8/16/2019 15 - Phylogeny

    53/171

    Tree Construction

    35

  • 8/16/2019 15 - Phylogeny

    54/171

    Tree Construction

    • Local decisions of direction on pairs ofimages is not a good idea...

    35

  • 8/16/2019 15 - Phylogeny

    55/171

    Tree Construction

    • Local decisions of direction on pairs ofimages is not a good idea...

    • Proposition: we want a MST.

    35

  • 8/16/2019 15 - Phylogeny

    56/171

    Tree Construction

    • Local decisions of direction on pairs ofimages is not a good idea...

    • Proposition: we want a MST....but we have a complete directed graph.

    35

    MST f di d h

  • 8/16/2019 15 - Phylogeny

    57/171

    MST of directed graphs

    in the LiteratureThe Optimum Branching  problem finds the

    MST of a directed graph for a given root.

    In our context, it would have to be applied

    to each vertex as a root, and the final

    complexity in our scenario would be O(n^3).

    It also uses a Fibonacci Heap.

    36

  • 8/16/2019 15 - Phylogeny

    58/171

    Oriented Kruskal

    37

  • 8/16/2019 15 - Phylogeny

    59/171

    Oriented Kruskal

    Our method runs once, and finds both the

    root and structure simultaneously.

    It has an complexity ! we need

    to sort all edges of the complete graph.

    It requires the Union-Find data structure.

    38

    O(n2 log n)n2

  • 8/16/2019 15 - Phylogeny

    60/171

     

    ! 1 M[6,1] = 12 Select Edge ( 1 6 )

    ! 2 M[4,5] = 15 Select Edge ( 5 4 )

    " 3 M[4,1] = 16 Test II: Root(1) = 6

    ! 4 M[5,2] = 18 Select Edge ( 2 5 )

    " 5 M[6,5] = 19 Test II: Root(5) = 4

    ! 6 M[6,3] = 22 Select Edge ( 3 6 )

    " 7 M[2,4] = 23 Test I: Root(2) = Root(4)

    ! 8 M[4,6] = 27 Select Edge ( 6 4 )

     Algorithm Steps

    [ 6 , 5 , 6 , 4 , 4 , 4 ]Reconstructed Tree

     

    4

    5

    2

    6

    1 3

    1

    2

    4

    8

    6

    M   1 2 3 4 5 6

    1   - 31 57 37 45 49

    2   31 - 33 23 29 32

    3   51 41 - 42 37 38

    4   16 36 28 - 15 27

    5   35 18 54 30 - 54

    6   12 40 22 60 19 -

    Dissimilarity Matrix

    Construction Example

    E l ti

  • 8/16/2019 15 - Phylogeny

    61/171

    Evaluation:

    Comparing Trees

    40

    E l ti

  • 8/16/2019 15 - Phylogeny

    62/171

    Evaluation:

    Comparing Trees

    40

    E l ti

  • 8/16/2019 15 - Phylogeny

    63/171

    Evaluation:

    Comparing Trees

    40

    E l ti

  • 8/16/2019 15 - Phylogeny

    64/171

    Evaluation:

    Comparing Trees

    40

  • 8/16/2019 15 - Phylogeny

    65/171

    IPT Experiments

    41

  • 8/16/2019 15 - Phylogeny

    66/171

    IPT Experiments

    • Experimental Setup.

    41

  • 8/16/2019 15 - Phylogeny

    67/171

    IPT Experiments

    • Experimental Setup.

    •Complete Trees.

    41

  • 8/16/2019 15 - Phylogeny

    68/171

    IPT Experiments

    • Experimental Setup.

    •Complete Trees.

    • Missing Nodes.

    41

  • 8/16/2019 15 - Phylogeny

    69/171

    IPT Experiments

    • Experimental Setup.

    •Complete Trees.

    • Missing Nodes.• Missing Root.

    41

  • 8/16/2019 15 - Phylogeny

    70/171

    IPT Experiments

    • Experimental Setup.

    •Complete Trees.

    • Missing Nodes.• Missing Root.

    • Missing Internal Nodes.

    41

  • 8/16/2019 15 - Phylogeny

    71/171

    IPT Experiments

    • Experimental Setup.

    •Complete Trees.

    • Missing Nodes.• Missing Root.

    • Missing Internal Nodes.• Real ND sets from the Web.

    41

  • 8/16/2019 15 - Phylogeny

    72/171

    IPT Experiments

    • Experimental Setup.

    •Complete Trees.

    • Missing Nodes.• Missing Root.

    • Missing Internal Nodes.• Real ND sets from the Web.

    • A first look into Forests.41

    Experimental Setup

  • 8/16/2019 15 - Phylogeny

    73/171

    Experimental Setup

    42

    Experimental Setup

  • 8/16/2019 15 - Phylogeny

    74/171

    Experimental Setup

    • 50 raw images from UCID.

    42

    Experimental Setup

  • 8/16/2019 15 - Phylogeny

    75/171

    Experimental Setup

    • 50 raw images from UCID.• Trees with 10, 20, 30, 40, and 50 nodes.

    42

    Experimental Setup

  • 8/16/2019 15 - Phylogeny

    76/171

    Experimental Setup

    • 50 raw images from UCID.• Trees with 10, 20, 30, 40, and 50 nodes.

    • For every size, 50 random tree topologies, eachwith 10 different random parameters.

    42

    Experimental Setup

  • 8/16/2019 15 - Phylogeny

    77/171

    Experimental Setup

    • 50 raw images from UCID.• Trees with 10, 20, 30, 40, and 50 nodes.

    • For every size, 50 random tree topologies, eachwith 10 different random parameters.

    • ND set created with affine transformation, crop,brightness-contrast-gamma on each channel and

    compression. We use ImageMagick.

    42

    Experimental Setup

  • 8/16/2019 15 - Phylogeny

    78/171

    Experimental Setup

    • 50 raw images from UCID.• Trees with 10, 20, 30, 40, and 50 nodes.

    • For every size, 50 random tree topologies, eachwith 10 different random parameters.

    • ND set created with affine transformation, crop,brightness-contrast-gamma on each channel and

    compression. We use ImageMagick.

    • Dissimilarity construction with OpenCV andlibjpg: affine transformation, brightness-contrast by

    channel, and compression.42

    Complete Trees

  • 8/16/2019 15 - Phylogeny

    79/171

    Complete Trees

    !"

    $!"

    %!"

    &!"

    '!"

    (!"

    )!"

    *!"

    +!"

    ,!"

    $!!"

    $! %! &! '! (!

       -   .   /   0   1 

    -231

    !""# %&'() *(+,() -./()#01

    43

    Complete Trees

  • 8/16/2019 15 - Phylogeny

    80/171

    Complete Trees

    !"

    $!"

    %!"

    &!"

    '!"

    (!"

    )!"

    *!"

    +!"

    ,!"

    $!!"

    $! %! &! '! (!

       -   .   /   0   1 

    -231

    !""# %&'() *(+,() -./()#01

    If the correct root is at depth zero, we identified the root of thetree. Here, regardless of the tree size, the average depth at

    which our solution finds the correct root is lower than 0.03. 43

  • 8/16/2019 15 - Phylogeny

    81/171

    Missing Links

    • On the wild, it is unrealistic to expect tohave all the nodes of the tree.

    • How to handle missing links?How do we evaluate the algorithm?

    44

  • 8/16/2019 15 - Phylogeny

    82/171

    Missing Nodes

    12

    65

    1

    7   8 9

    43

    1110

    Using Ancestry Information

    2

    12

    5

    1

    7

    11

    2

    45

    Missing only

  • 8/16/2019 15 - Phylogeny

    83/171

    Missing only

    Internal Nodes

    !"

    $!"

    %!"

    &!"

    '!"

    (!"

    )!"

    *!"

    +!"

    ,!"

    $!!"

    ( $! $( %! %( &! &( '! '( (!

       -   .   /   0   1 

    -231

    !""# %&'() *(+,() -./()#01

    46

    Missing Root and

  • 8/16/2019 15 - Phylogeny

    84/171

    Missing Root and

    Internal Nodes

    !"

    $!"

    %!"

    &!"

    '!"

    (!"

    )!"

    *!"

    +!"

    ,!"

    $!!"

    ( $! $( %! %( &! &( '! '( (!

       -   .   /   0   1 

    -231

    !""# %&'() *(+,() -./()#01

    47

    Real ND sets from Web

  • 8/16/2019 15 - Phylogeny

    85/171

    Real ND sets from Web

    48

    How to Evaluate results?

  • 8/16/2019 15 - Phylogeny

    86/171

    How to Evaluate results?• Since we do not know the ground truths, we

    evaluate the stability of reconstruction.

    49

    How to Evaluate results?

  • 8/16/2019 15 - Phylogeny

    87/171

    How to Evaluate results?• Since we do not know the ground truths, we

    evaluate the stability of reconstruction.

    Er1: one if the new node IB is not a child of its

    generating node IA.

    49

    How to Evaluate results?

  • 8/16/2019 15 - Phylogeny

    88/171

    How to Evaluate results?• Since we do not know the ground truths, we

    evaluate the stability of reconstruction.

    Er1: one if the new node IB is not a child of its

    generating node IA.

    Er2: one if the structure of the tree relating the

    nodes in the original set changes with the insertionof the new node IB in the set.

    49

    How to Evaluate results?

  • 8/16/2019 15 - Phylogeny

    89/171

    How to Evaluate results?• Since we do not know the ground truths, we

    evaluate the stability of reconstruction.

    Er1: one if the new node IB is not a child of its

    generating node IA.

    Er2: one if the structure of the tree relating the

    nodes in the original set changes with the insertionof the new node IB in the set.

    Er3: one if the new node IB appears as a father ofanother node on the original tree.

    49

    How to Evaluate results?

  • 8/16/2019 15 - Phylogeny

    90/171

    • Since we do not know the ground truths, weevaluate the stability of reconstruction.

    Er1: one if the new node IB is not a child of its

    generating node IA.

    Er2: one if the structure of the tree relating the

    nodes in the original set changes with the insertionof the new node IB in the set.

    Er3: one if the new node IB appears as a father ofanother node on the original tree.

    Er4: is one if the root of the reconstructed tree ofthe original set is different from the root of the

    reconstructed tree of the set augmented with IB.

    49

    How to Evaluate results?

  • 8/16/2019 15 - Phylogeny

    91/171

    • Since we do not know the ground truths, weevaluate the stability of reconstruction.

    Er1: one if the new node IB is not a child of its

    generating node IA.

    Er2: one if the structure of the tree relating the

    nodes in the original set changes with the insertionof the new node IB in the set.

    Er3: one if the new node IB appears as a father ofanother node on the original tree.

    Er4: is one if the root of the reconstructed tree ofthe original set is different from the root of the

    reconstructed tree of the set augmented with IB.

    P: one if the reconstructed tree is perfectcompared to the original tree (Er1 = Er2 = 0).

    49

    How to Evaluate results?

  • 8/16/2019 15 - Phylogeny

    92/171

    • Since we do not know the ground truths, weevaluate the stability of reconstruction.

    [ 6 , 5 , 6 , 4 , 4 , 4 ]Initial Tree

    4

    5 6

    1 3

    4

    5

    2

    6

    1

    3

    5

    7

     I 7  =  T ~ β( I 5)

    Select one Node and  Artificially Generate a Direct Descendant 

    [ 7 , 5 , 6 , 4 , 4 , 4 , 6 ]

    Tree After Inserting Node 7 

    2

    7

    Reconstruction Errors

    Er1   1

    Er2   1

    Er3   1

    Er4   0

    Suc ess

    P 0

    Er1: one if the new node IB is not a child of its

    generating node IA.

    Er2: one if the structure of the tree relating the

    nodes in the original set changes with the insertionof the new node IB in the set.

    Er3: one if the new node IB appears as a father ofanother node on the original tree.

    Er4: is one if the root of the reconstructed tree ofthe original set is different from the root of the

    reconstructed tree of the set augmented with IB.

    P: one if the reconstructed tree is perfectcompared to the original tree (Er1 = Er2 = 0).

    49

     

  • 8/16/2019 15 - Phylogeny

    93/171

    Real ND setsORIENTED K RUSKAL I PT  ALGORITHM R ESULTS FOR THE  U NCONSTRAINED  S CENARIO.

    Description   # of Cases %Er1   %Er2   %Er3   %Er4   %P

    TG1   Iranian Missiles 90 40.0% 11.1% 11.1% 0.0% 55.6%

    TG2   Bush Reading 95 17.9% 3.2% 3.2% 0.0% 81.1%

    TG3   WTC Tourist 95 25.3% 6.3% 6.3% 1.1% 71.6%

    TG4   BP Oil Spill 100 25.0% 0.0% 0.0% 0.0% 75.0%

    TG5   Israeli-Palestinian Peace Talks 95 21.1% 7.4% 7.4% 0.0% 75.8%

    TG6   Criminal Record 90 41.1% 13.3% 13.3% 0.0% 54.4%

    TG7   Palin and Rifle 100 17.0% 2.0% 2.0% 0.0% 81.0%

    TG8   Beatles Rubber 100 8.0% 9.0% 9.0% 1.0% 85.0%

    TG9   Kerry and Fonda 80 21.3% 13.8% 13.8% 0.0% 68.8%

    TG10   OJ Simpson 90 18.9% 2.2% 2.2% 0.0% 78.9%

    Average   93.5 23.5% 6.8% 6.8% 0.2% 72.7%

    50

    What’s up with this

  • 8/16/2019 15 - Phylogeny

    94/171

    What s up with this

    near du licate set?

    51

    What’s up with this

  • 8/16/2019 15 - Phylogeny

    95/171

    What s up with this

    near du licate set?

    52

    What’s up with this

  • 8/16/2019 15 - Phylogeny

    96/171

    What s up with this

    near du licate set?

    52

    What’s up with this

  • 8/16/2019 15 - Phylogeny

    97/171

    What s up with this

    near du licate set?

    52

    Cl

  • 8/16/2019 15 - Phylogeny

    98/171

    Close-up

    53

    Cl

  • 8/16/2019 15 - Phylogeny

    99/171

    Close-up

    54

    Fi k F

  • 8/16/2019 15 - Phylogeny

    100/171

    First peek at Forests

    • Forests (multiple co-existing trees) are areal case in real applications.

    • Can our method be modified to findmultiple trees?

    55

  • 8/16/2019 15 - Phylogeny

    101/171

    Video Phylogeny Tree:VPT

    Vid Ph l T

  • 8/16/2019 15 - Phylogeny

    102/171

    Video Phylogeny Tree

    • We ignore the sound track.

    • We use only static image content.

    57

    Vid Ph l T

  • 8/16/2019 15 - Phylogeny

    103/171

    Video Phylogeny Tree

    • We ignore the sound track.

    • We use only static image content.

    Why not get one frame, and

    use the IPT as the VPT?

    57

    Vid Ph l T

  • 8/16/2019 15 - Phylogeny

    104/171

    Video Phylogeny Tree

    • We ignore the sound track.

    • We use only static image content.

    Why not get one frame, and

    use the IPT as the VPT?

    The IPTs of frames aredifferent along the video!!!

    57

    Vid Ph l T

  • 8/16/2019 15 - Phylogeny

    105/171

    Video Phylogeny Tree

    • We ignore the sound track.

    • We use only static image content.

    Why not get one frame, and

    use the IPT as the VPT?

    The IPTs of frames aredifferent along the video!!!

    But why?57

  • 8/16/2019 15 - Phylogeny

    106/171

    58

  • 8/16/2019 15 - Phylogeny

    107/171

    58

  • 8/16/2019 15 - Phylogeny

    108/171

    58

  • 8/16/2019 15 - Phylogeny

    109/171

    58

  • 8/16/2019 15 - Phylogeny

    110/171

    58

  • 8/16/2019 15 - Phylogeny

    111/171

    58

  • 8/16/2019 15 - Phylogeny

    112/171

    58

  • 8/16/2019 15 - Phylogeny

    113/171

  • 8/16/2019 15 - Phylogeny

    114/171

    58

  • 8/16/2019 15 - Phylogeny

    115/171

    58

  • 8/16/2019 15 - Phylogeny

    116/171

    58

  • 8/16/2019 15 - Phylogeny

    117/171

  • 8/16/2019 15 - Phylogeny

    118/171

    58

  • 8/16/2019 15 - Phylogeny

    119/171

    58

  • 8/16/2019 15 - Phylogeny

    120/171

    58

  • 8/16/2019 15 - Phylogeny

    121/171

    58

    Different IPTs

  • 8/16/2019 15 - Phylogeny

    122/171

    Different IPTs

    • Different quality over time, for example:

    - black frames,

    - blur,- compression artifacts,

    -dynamic range.

    • Which (if any) is the right one?59

    A Few Approaches

  • 8/16/2019 15 - Phylogeny

    123/171

    A Few Approaches

    60

    A Few Approaches

  • 8/16/2019 15 - Phylogeny

    124/171

    A Few Approaches

    • Expected result from a single frame IPT (baseline).

    60

    A Few Approaches

  • 8/16/2019 15 - Phylogeny

    125/171

    A Few Approaches

    • Expected result from a single frame IPT (baseline).

    • Minimum dissimilarity matrix followed by IPT.

    60

    A Few Approaches

  • 8/16/2019 15 - Phylogeny

    126/171

    A Few Approaches

    • Expected result from a single frame IPT (baseline).

    • Minimum dissimilarity matrix followed by IPT.

    • Average dissimilarity matrix followed by IPT.

    60

    A Few Approaches

  • 8/16/2019 15 - Phylogeny

    127/171

    A Few Approaches

    • Expected result from a single frame IPT (baseline).

    • Minimum dissimilarity matrix followed by IPT.

    • Average dissimilarity matrix followed by IPT.

    • Reconciliation Tree.

    60

    Single-Frame

  • 8/16/2019 15 - Phylogeny

    128/171

    Ex ectation

    61

    Single-Frame

  • 8/16/2019 15 - Phylogeny

    129/171

    Ex ectation

    • Calculate IPT on each frame.

    61

    Single-Frame

  • 8/16/2019 15 - Phylogeny

    130/171

    Ex ectation

    • Calculate IPT on each frame.

    61

    Single-Frame

  • 8/16/2019 15 - Phylogeny

    131/171

    Ex ectation

    • Calculate IPT on each frame.

    • Calculate Expectation of metrics, but does notreconstruct a VPT (Video Phylogeny Tree).

    61

    Min / Avera e

  • 8/16/2019 15 - Phylogeny

    132/171

    62

    Min / Avera e

  • 8/16/2019 15 - Phylogeny

    133/171

    •Sample frames.

    62

    Min / Avera e

  • 8/16/2019 15 - Phylogeny

    134/171

    •Sample frames.

    • Calculate Dissimilarity Matrix on eachsynchronized frames.

    62

    Min / Avera e

  • 8/16/2019 15 - Phylogeny

    135/171

    •Sample frames.

    • Calculate Dissimilarity Matrix on eachsynchronized frames.

    • Create a new Dissimilarity Matrix using theframe’s Dissimilarities

    - min,- average,

    -normalized min,

    - normalized average.

    62

    Min / Avera e

  • 8/16/2019 15 - Phylogeny

    136/171

    •Sample frames.

    • Calculate Dissimilarity Matrix on eachsynchronized frames.

    • Create a new Dissimilarity Matrix using theframe’s Dissimilarities

    - min,- average,

    -normalized min,

    - normalized average.

    • Construct VPT using oriented Kruskal onthis new Matrix.

    62

    Reconciliation

  • 8/16/2019 15 - Phylogeny

    137/171

    A roach

    63

    Reconciliation

  • 8/16/2019 15 - Phylogeny

    138/171

    A roach• Sample frames.

    63

    Reconciliation

  • 8/16/2019 15 - Phylogeny

    139/171

    A roach• Sample frames.

    • Calculate IPT on each frame.

    63

    Reconciliation

  • 8/16/2019 15 - Phylogeny

    140/171

    A roach• Sample frames.

    • Calculate IPT on each frame.• Reconcile the frame’s IPTs into the VPT.

    - Build Reconciliation Matrix.

    - Apply Tree Reconciliation Algorithm

    63

    Reconciliation Matrix

  • 8/16/2019 15 - Phylogeny

    141/171

    Reconciliation Matrix

    Algorithm 1   Reconciliation Matrix.

    Require:   number of near-duplicate videos,  nRequire:  number of selected frames,  f Require:   2-d vector,  t, with the  f   phylogeny trees previously calculated

    1:   for i ∈ [1..n]  do    Initialization2:   for j  ∈ [1..n]  do3:   P [i, j] ←  04:   end for5:   end for6:   for i ∈ [1..f ]  do    Creating the matrix  P 7:   for j  ∈ [1..n]  do8:   P [ j, t[i][ j]] = P [ j, t[i][ j]] + 19:   end for

    10:   end for11:   return  P      Returning the parenthood matrix  P 

    64

    Tree Reconciliation Alg.

     

    Al i h 2 T R ili i

  • 8/16/2019 15 - Phylogeny

    142/171

    Algorithm 2  Tree Reconciliation.

    Require:   number of near-duplicate videos,  nRequire:   matrix,  P , from Algorithm 1

    1:   for i  ∈  [1..n]  do    Tree initialization2:   tree[i] ← i3:   end for4:   sorted ←   sort positions  (i, j)  of  P   into nonincreasing order

      List of edges sorted from the most to the least common5:   r  ← 0     Initially, the final root  r   is not defined6:   nedges  ← 0

    7:   for each  position  (i, j)  ∈  sorted  do 

     Testing each edge in order8:   if  r = 0  and   i =  j   then    Defining the root of the tree9:   r  ← i

    10:   end if 11:   if  i  6=  r   then    If   i   is not the root of the tree12:   if  Root(i) 6=  Root( j)  then13:   if  Root( j) =  j   then14:   tree[ j]  ←  i

    15:   nedges  ← nedges + 116:   if  nedges  =  n − 1  then     If the tree is complete17:   return  tree    Returning the final VPT18:   end if 19:   end if 20:   end if 21:   end if 22:   end for

    65

  • 8/16/2019 15 - Phylogeny

    143/171

    Example

  • 8/16/2019 15 - Phylogeny

    144/171

    [ 1 , 1 , 1 , 1 , 3 , 4 ]

    1

    2 43

    5 6

    Frame 1

    [ 4 , 2 , 2 , 2 , 2 , 5 ]

    2

    3 54

    1 6

    Frame 2

    [ 1 , 1 , 1 , 2 , 2 , 3 ]

    1

    2 3

    6

    Frame 3

    4 5

    [ 4 , 4 , 2 , 4 , 1 , 2 ]

    4

    1

    Frame 4

    5

    2

    3 6

    [ 1 , 1 , 6 , 2 , 2 , 1 ]

    1

    2 6

    3

    Frame 5

    4 5

    [ 1 , 4 , 1 , 1 , 3 , 1 ]

    1

    3 64

    25

    Frame 6

    P 1 2 3 4 5 6

    1 4 2

    2 3 1 2

    3 3 2 1

    4 2 3 1

    5 1 3 2

    6 2 1 1 1 1

          C      h      i      l      d

    Parent

    4 (1,1)   !

    3 (2,1)   !

    3 (3,1)   !

    3 (4,2)   !

    3 (5,2)   !

    2 (1,4)   "

    2 (2,4)   "

    2 (3,2)   "

    2 (4,1)   "2 (5,3)   "

    2 (6,1)   !

    Edges

    [ 1 , 1 , 1 , 2 , 2 , 1 ]

    1

    2 63

    Final Video Phylogeny Tree

    54

    Example

    Experimental results

  • 8/16/2019 15 - Phylogeny

    145/171

    68

    Experimental results

  • 8/16/2019 15 - Phylogeny

    146/171

    •Limited experiments in this paper:

    - Ignored temporal cropping,- Ignored video compression on dissimilarities,- 16 Videos (Super Bowl Commercials 2011),

    - 16 trees,- 10 near-duplicates per tree.

    68

    Experimental results

  • 8/16/2019 15 - Phylogeny

    147/171

    • Limited experiments in this paper:- Ignored temporal cropping,- Ignored video compression on dissimilarities,- 16 Videos (Super Bowl Commercials 2011),

    - 16 trees,- 10 near-duplicates per tree.

    • Transformations using mencoder.

    68

    Experimental results

  • 8/16/2019 15 - Phylogeny

    148/171

    • Limited experiments in this paper:- Ignored temporal cropping,- Ignored video compression on dissimilarities,- 16 Videos (Super Bowl Commercials 2011),

    - 16 trees,- 10 near-duplicates per tree.

    • Transformations using mencoder.

    • Sampling frames and sync by ffmpeg.

    68

  • 8/16/2019 15 - Phylogeny

    149/171

    Transformations 

  • 8/16/2019 15 - Phylogeny

    150/171

    Transformations

    Table ITRANSFORMATIONS AND THEIR OPERATIONAL RANGES FOR CREATINGTHE CONTROLLED DATA SET.

    Transformation Oper. Range

    (1) Global Resampling/Scaling (Up/Down)   [90%, 110%](2) Scaling by axis   [90%, 110%](3) Cropping   [0%, 5%](4) Brightness Adjustment   [−10%, 10%](5) Contrast Adjustment   [−10%, 10%](6) Gamma Correction   [0.9, 1.1]

    We used mencoder to generate theNearDuplicates, with these transformations and

    ranges:

    69

    Comparing Trees

  • 8/16/2019 15 - Phylogeny

    151/171

    Comparing Trees

    70

    Results of Approaches

  • 8/16/2019 15 - Phylogeny

    152/171

    Results of Approaches

    Table IIAVERAGE RESULTS FOR THE  16× 16  TEST CASES UNDER CONSIDERATION

    FOR THE PROPOSED  VPT   METHODS.

    Method Root Depth Edges Leaves Ancestry(E) Single Frame 76.5% 0.382 54.2% 67.7% 58.6%

    (1) Min 59.0% 0.926 49.6% 64.1% 50.8%(2) Min-Norm 68.0% 0.605 51.3% 66.4% 54.2%

    (3) Avg 85.6% 0.215 56.6% 70.3% 62.0%(4) Avg-Norm 85.9% 0.203 58.0% 72.4% 64.5%(5) Reconc. Tree 91.0% 0.098 65.8% 77.7% 70.4%

    (5)/(E)   Boost   18.9%   74.3%   21.4%   14.7%   20.1%

    71

    Experimental Results

  • 8/16/2019 15 - Phylogeny

    153/171

    Table IIIRESULTS FOR THE TRE E  RECONCILIATION APPROACH USING  1 6

    DIFFERENT TREES.

    Video Root Depth Edges Leaves Ancestry

    V  01   100.0% 0.000 68.1% 77.9% 73.4%V  02   87.5% 0.125 66.9% 76.0% 68.8%V  03   75.0% 0.312 56.9% 73.7% 57.8%V  04   81.2% 0.188 57.5% 68.2% 60.7%V  05   93.8% 0.062 69.4% 81.3% 73.9%V  06   93.8% 0.125 66.2% 77.7% 72.7%

    V  07   100.0% 0.000 73.1% 83.2% 79.5%V  08   93.8% 0.062 59.4% 75.0% 66.1%V  09   100.0% 0.000 70.6% 80.2% 73.1%V  10   100.0% 0.000 65.6% 75.9% 72.3%V  11   81.2% 0.188 64.4% 80.0% 69.8%V  12   100.0% 0.000 68.7% 80.2% 76.4%V  13   87.5% 0.125 75.0% 82.5% 77.7%V  14   100.0% 0.000 69.4% 78.1% 72.5%V  15   81.2% 0.188 56.9% 72.8% 64.2%V  16   81.2% 0.188 65.0% 80.2% 67.2%

    Average   91.0% 0.098 65.8% 77.7% 70.4%Std Dev   8.8% 0.097 5.6% 4.0% 6.0%

    72

    Limitations of 

  • 8/16/2019 15 - Phylogeny

    154/171

     frame-based VPT

    73

    Limitations of 

  • 8/16/2019 15 - Phylogeny

    155/171

     frame-based VPT• Does not use sound.

    73

    Limitations of 

  • 8/16/2019 15 - Phylogeny

    156/171

     frame-based VPT• Does not use sound.• Ignores temporal information of the visual content.

    73

    Limitations of 

  • 8/16/2019 15 - Phylogeny

    157/171

     frame-based VPT• Does not use sound.• Ignores temporal information of the visual content.

    • Requires sync frames!- This is actually a very complicated issue in Video, and

    some codecs are finickier than others.

    - If we allow change in FPS + temporal crop, it might beimpossible to fulfill this requisite.

    73

  • 8/16/2019 15 - Phylogeny

    158/171

    Discussion

    What’s next?

  • 8/16/2019 15 - Phylogeny

    159/171

    75

    What’s next?

  • 8/16/2019 15 - Phylogeny

    160/171

    • Better study of the effect of the family oftransformations on images and video.

    75

    What’s next?

  • 8/16/2019 15 - Phylogeny

    161/171

    • Better study of the effect of the family oftransformations on images and video.

    • Use Reencoding for better video Dissimilarities.

    75

    What’s next?

  • 8/16/2019 15 - Phylogeny

    162/171

    • Better study of the effect of the family oftransformations on images and video.

    • Use Reencoding for better video Dissimilarities.

    • Global dissimilarity for video (instead of framebased).

    75

    What’s next?

  • 8/16/2019 15 - Phylogeny

    163/171

    • Better study of the effect of the family oftransformations on images and video.

    • Use Reencoding for better video Dissimilarities.

    • Global dissimilarity for video (instead of framebased).

    • Reconstructing Forests.

    75

    What’s next?

  • 8/16/2019 15 - Phylogeny

    164/171

    • Better study of the effect of the family oftransformations on images and video.

    • Use Reencoding for better video Dissimilarities.

    • Global dissimilarity for video (instead of framebased).

    • Reconstructing Forests.

    • Large Scale Scenario.75

    Other Domains

  • 8/16/2019 15 - Phylogeny

    165/171

    76

    Other Domains

  • 8/16/2019 15 - Phylogeny

    166/171

    • Software Development.• Phylogeny of code.

    76

    Other Domains

  • 8/16/2019 15 - Phylogeny

    167/171

    • Software Development.• Phylogeny of code.

    • Computational Linguistics.• Phylogeny of Historical Documents.

    76

    Other Domains

  • 8/16/2019 15 - Phylogeny

    168/171

    • Software Development.• Phylogeny of code.

    • Computational Linguistics.• Phylogeny of Historical Documents.

    • File Carving through Phylogeny Algorithms.

    76

    Other Domains

  • 8/16/2019 15 - Phylogeny

    169/171

    • Software Development.• Phylogeny of code.

    • Computational Linguistics.• Phylogeny of Historical Documents.

    • File Carving through Phylogeny Algorithms.

    • These cases are closer to the traditionalBiological application, with metrics.

    76

    Other Domains

  • 8/16/2019 15 - Phylogeny

    170/171

    • Software Development.• Phylogeny of code.

    • Computational Linguistics.• Phylogeny of Historical Documents.

    • File Carving through Phylogeny Algorithms.

    • These cases are closer to the traditionalBiological application, with metrics.

    Beyond the Naïve Plagiarism Detection.76

  • 8/16/2019 15 - Phylogeny

    171/171

    Thank you.