Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins.
-
Upload
stanley-wilkins -
Category
Documents
-
view
216 -
download
0
Transcript of Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins.
Stochastic Approach for Link Structure Analysis
(SALSA)
Presented by Adam Simkins
SALSA
• Created by Lempel Moran in 2000
• Combination of HITS and PageRank
SALSA’s similarities to HITS and PageRank
• SALSA uses authority and hub score
• SALSA creates a neighborhood graph using authority and hub pages and links
SALSA’s differences between HITS and PageRank
• The SALSA method create a bipartite graph of the authority and hub pages in the neighborhood graph.
• One set contains hub pages
• One set contains authority pages
• Each page may be located in both sets
Neighborhood Graph G
Bipartite Graph G of Neighborhood Graph N
Markov Chains
• Two matrices formed from bipartite graph G
• A hub Markov chain with matrix H
• An authority Markov chain with matrix A
Where does SALSA fit in?
• Matrices H and A can be derived from the adjacency matrix L used in the HITS and PageRank methods
• HITS used unweighted matrix L
• PageRank uses a row weighted version of matrix L
• SALSA uses both row and column weighting
How are H and A computed?
• Let Lr be L with each nonzero row divided by its row sum
• let Lc be L with each nonzero column divided by its column sum
• H, SALSA’s hub matrix, consists of the nonzero rows and columns of LrLc
T
• A, SALSA’s authority matrix, consists of the nonzero rows and columns of Lc
TLr
Eigenvectors
• Av = λv
• vTA = λ vT
• Numerically: Power Method
The Power Method
• Xk+1 = AXk
• Xk+1T = Xk
TA
• Converges to the dominant eigenvector
( λ = 1).
The Power Method• Matrices H and A must be irreducible for
the power method to converge to a unique eigenvector given any starting value
• If our neighborhood graph G is connected, then both H and A are irreducible
• If G is not connected, then performing the power method on H and A will not result in the convergence to a unique dominant eigenvector
Our Graph is not connected!
• In our example it is clear to see that the graph is not connected as page 2 in the hub set is only connected to page 1 in the authority set and vice versa.
• H and A are reducible and therefore contain multiple irreducible connected components
Connected Components
• H contains two connected components, C = {2} and D = {1, 3, 6, 10}
• A contains two connected components, E = {1} and F = {3, 5, 6}
Cutting and Pasting. Part I
• We can now perform the power method on each component for H and A
Cutting and Pasting. Part II
• We can now paste the two components together for each matrix
• We must multiply each entry in the vector by its appropriate weight
H:
A:
Strengths and Weaknesses
• Not affected as much my topic drift like HITS
• It gives authority and hub scores.
• Handles spamming better than HITS, but not near as good as PageRank
• query dependence
Thank You For Your Time!