Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

59
Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia

Transcript of Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

Page 1: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

Link Analysis and Anti-Spam

Link Analysis and Anti-Spam

Tie-Yan LiuMicrosoft Research Asia

Page 2: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 2

OutlineOutline

• First Session�Overview of Link Analysis Technologies �PageRank and HITS

• Second Session�More about Link Analysis Algorithms

• Third Session�Spam and Anti-Spam

• Homework

Page 3: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

First SessionFirst Session

Page 4: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 4

Typical Search Engine ArchitectureTypical Search Engine Architecture

Page 5: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 5

Ranking for the Search ResultsRanking for the Search Results• Today’s search engines may return millions of

pages for a certain query• It is definitely not possible for the user to

preview all these results• An appropriate ranking will be very helpful.

�Ranking on relevance�Ranking on importance

Page 6: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 6

Traditional IR RankingTraditional IR Ranking

• A ranking purely on relevance�Term frequency (tf)�Inverse Document Frequency (idf)�Okapi …�Many other aspects that Dr. Shuming Shi will mention in the next course.

Page 7: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 7

Limitations of Traditional IRLimitations of Traditional IR

• Text-based ranking function�www.harvard.edu can hardly be recognized as one of the most authoritative pages for the query “harvard”, since many other web pages contain “harvard” more often.�The number of pages with the same relevance is still too large for the users to preview.

• Pages are not sufficiently self-descriptive�Usually the term “search engine” doesn't appear on the web pages of search engines.

Page 8: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 8

What’s More for Web SearchWhat’s More for Web Search

• In order to solve these problems� We must leverage other information on the Web� We must distinguish those pages with the same

amount of relevance• Link Analysis

� The web is not just a collection of pure-text documents

• the hyperlinks are also very important!

� A link from page A to page B may indicate:• A is related to B, or• A is recommending, citing, voting for or endorsing B

� Links effect the ranking of web pages and thus have commercial value.

Page 9: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 9

Famous Link Analysis MethodsFamous Link Analysis Methods• HITS• PageRank

Page 10: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 10

HITS - Kleinberg’s AlgorithmHITS - Kleinberg’s Algorithm

• HITS – Hypertext Induced Topic Selection• For each vertex v in a subgraph of interest:

�a(v) - the authority of v�h(v) - the hubness of v

• A site is very authoritative if it receives many citations. Citation from important sites weight more than citations from less-important sites

• Hubness shows the importance of a site. A good hub is a site that links to many authoritative sites

Page 11: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 11

Authority and HubnessAuthority and Hubness

2

3

4

1 1

5

6

7

a(1) = h(2) + h(3) + h(4) h(1) = a(5) + a(6) + a(7)

Page 12: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 12

Convergence of Authority and HubnessConvergence of Authority and Hubness

• Recursive dependency:

a(v) Σ h(w)

h(v) Σ a(w)

• Using Linear Algebra, we can prove:

w pa[v]

w ch[v]

a(v) and h(v) converge

Page 13: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 13

HITS ExampleHITS Example

• {1, 2, 3, 4} - nodes relevant to the topic

• Expand the root set R to include all the children and a fixed number of parents of nodes in R

A new set S (base subgraph)

• Start with a root set R {1, 2, 3, 4}

Find a base subgraph:

Page 14: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 14

HITS ExampleHITS Example

HubsAuthorities(G)1 1 [1,…,1] R 2 a h 13 t 14 repeat5 for each v in V6 do a (v) Σ h

(w)

7 h (v) Σ a (w)

8 a a / || a ||9 h h / || h ||10 t t + 111 until || a – a || + || h – h || < ε12 return (a , h )

Hubs and authorities: two n-dimensional a and h

0 0

t

t

t

t

t

t

t

t

t

t

tt

t -1

t -1

t -1

t -1

w pa[v]

w pa[v]

|V|

Page 15: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 15

HITS Example ResultsHITS Example Results

AuthorityHubness

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Authority and hubness weights

Page 16: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 16

Matrix Denotion of HITSMatrix Denotion of HITS• It is clear that the authority and hubness values cal

culated by the aforementioned algorithm is the left and right singular vector of the adjacency matrix of the base sub graph.

Page 17: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 17

PageRankPageRank• Introduced by Page et al (1998)

�The page rank is proportional to its parents’ rank, but inversely proportional to its parents’ outdegree

Page 18: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 18

Matrix NotationMatrix Notation

Adjacent Matrix

A =

Page 19: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 19

Matrix NotationMatrix Notation

• Matrix Notationr = B r

• Pagerank is embedded in the eigenvector of B associated with the eigen value 1.

B =

Page 20: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 20

Matrix NotationMatrix Notation

Page 21: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 21

Markov Chain NotationMarkov Chain Notation• Random surfer model

� Description of a random walk through the Web graph� Interpreted as a transition matrix with asymptotic

probability that a surfer is currently browsing that page

Does it converge to some sensible solution (as t∞) regardless of the initial ranks ?

rt = M rt-1

M: transition matrix for a first-order Markov chain (stochastic)

Page 22: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 22

ProblemProblem

• “Rank Sink” Problem�In general, many Web pages have no inlinks/outlinks�It results in dangling edges in the graph

E.g.

no parent rank 0MT converges to a matrix whose last column is all zero

no children no solutionMT converges to zero matrix

Page 23: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 23

ModificationModification

• Surfer will restart browsing by picking a new Web page at random

M = ( B + E )E : escape matrixM : stochastic matrix

• Still problem?� It is not guaranteed that M is primitive� If M is stochastic and primitive, PageRank converges to co

rresponding stationary distribution of M

Page 24: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 24

Distribution of the Mixture ModelDistribution of the Mixture Model• The probability distribution that results from combining the Markov

ian random walk distribution & the static rank source distribution

r = εe + (1- ε)xε: probability of selecting non-linked page

PageRank

Now, transition matrix [εH + (1- ε)M] is primitive and stochastic

rt converges to the dominant eigenvector

Page 25: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 25

PageRank v.s. HITS - AlgorithmPageRank v.s. HITS - Algorithm

Page 26: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 26

PageRank v.s. HITS - StabilityPageRank v.s. HITS - Stability• Whether the link analysis algorithms based on eigenvectors a

re stable in the sense that results don’t change significantly?• General Strategy for evaluating stability:

� 1. Start with original adjacency matrix, A� 2. Perturb the matrix to get A*, Select k nodes in graph to

add or delete� 3. Compute distance, d(r(A),r(A*)), for some distance mea

sure d and objective function r that measures the quality of results of A’ somehow

� 4. Compute amount of perturbation p(Α,Α*) for some distance function p that measures the amount of perturbation

� 5. Evaluate the conditions, if any, where small values for p generate large values for d

Page 27: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 27

Stability of HITSStability of HITS

• Ng 2001�A bound on the number of hyperlinks k that can added or deleted from one page without affecting the authority or hubness weights�

• Observations�Stability determined by eigengap�Eigengap: difference between 1st and 2nd eigenvalues

• ATA for authorities, AAT for hubs�If eigengap is big, HITS will be insensitive to small perturbations, vice versa if small

δ: eigengap λ1 – λ2

d: maximum outdegree of G

Page 28: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 28

Stability of PageRankStability of PageRank• Looser bound

�Ng et al (2001)�Bianchini et al (2001)

• Observations�The parameter ε of the mixture model has a stabilization role �If original k pages to be modified do not have high overall PR scores then perturbed scores will not be far from the original

Page 29: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

Second SessionSecond Session

Page 30: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 30

Pre-PageRankPre-PageRank• PageRank achieves great success in the industry,

many people regarded it as a break-through in the research field as well.

• Actually the basic idea of PageRank has already appeared in many previous works

�Mark 1988�Bray 1996�Marchiori 1997�……

Page 31: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 31

Mark 1988Mark 1988• To calculate the score S of a document at vertex v

S(v) = s(v) + 1

| ch[v] | Σ S(w) w |ch(v)|

v: a vertex in the hypertext graph G = (V, E)S(v): the global score s(v): the score if the document is isolatedch(v): children of the document at vertex v

• Limitation:- Require G to be a directed acyclic graph (DAG)- If v has a single link to w, S(v) > S(w)- If v has a long path to w and s(v) < s(w),

then S(v) > S (w)

Mark, D. M., (1988), "Network models in geomorphology," Chapter 4 in Modeling in Geomorphologic Systems, Edited by M. G. Anderson, John Wiley., p.73-97.

Page 32: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 32

Bray 1996Bray 1996

• The visibility of a site is measured by the number of other sites pointing to it

�Authority?• The luminosity of a site is measured by the

number of other sites to which it points�Hub?

Page 33: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 33

Marchiori (1997)Marchiori (1997)

S(v) = s(v) + h(v)- S(v): overall information- s(v): textual information - h(v): hyper information

• Hyper information should complement textual information to obtain the overall information

• h(v) = Σ F S(w)w |ch[v]|

r(v, w)

- F: a fading constant, F Є (0, 1) - r(v, w): the rank of w after sorting the children of v

by S(w)

Page 34: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 34

Post PageRankPost PageRank• And following the success of PageRank, a lot of ne

w algorithms were also proposed.�Fast PageRank calculation (Haveliwala)�Topic-sensitive PageRank�Personalized PageRank�LinkFusion�……

Page 35: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 35

Fast PageRank calculation [Haveliwala – 1999]Fast PageRank calculation [Haveliwala – 1999]• Partition the destination vector into d blocks

that each fit into main memory, and to compute one block at a time.

• This algorithm is quite similar in structure to the Block Nested-Loop Join algorithm in database systems. which also performs very well for data sets of moderate size but eventually loses out to more scalable approaches.

Page 36: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 36

Fast PageRank calculation [Haveliwala – 2003]Fast PageRank calculation [Haveliwala – 2003]• Basic observation:

� the convergence rates of the PageRank values of individual pages during application of the Power Method is nonuniform. That is, many pages converge quickly, with a few pages taking much longer to converge. Furthermore, the pages that converge slowly are generally those pages with high PageRank.

Page 37: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 37

Topic-Specific PageRank [Haveliwala - WWW02]Topic-Specific PageRank [Haveliwala - WWW02]• Topic-specific PageRanks

�For each page precomputed PageRank values of the most relevant topics used for each query.�16 topics

Page 38: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 38

Link Fusion –[Zeng, WWW04]Link Fusion –[Zeng, WWW04]• In a more generalized scenario, suppose there are N data types. The im

portance attribute of one type of object can be reinforced by both inter and intra-type links as:

• Suppose w is the attribute vector of all the objects in the URM. Link Fusion can be represented as:

wnew=LurmTwold

• Such iterative calculation can be continued:

wn=(LurmT)nw0

• The result w is the prime eigenvector of Lurm, which can be explained as the value of data objects regarding a specific attribute.

NNNNNN

NN

NN

urm

LLL

LLL

LLL

L

2211

22222121

11121211

Page 39: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 39

Limits of Link AnalysisLimits of Link Analysis

• Pay-for-place � Search engine bias : organizations pay search

engines and page rank� Advertisements: organizations pay high ranking

pages for advertising space• With a primary effect of increased visibility to end users

and a secondary effect of increased respectability due to relevance to high ranking page

Page 40: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 40

Limits of Link AnalysisLimits of Link Analysis

• Stability� Adding even a small number of nodes/edges to the

graph has a significant impact• Topic drift

� A top authority may be a hub of pages on a different topic resulting in increased rank of the authority page

• Content evolution� Adding/removing links/content can affect the

intuitive authority rank of a page requiring recalculation of page ranks

Page 41: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

Third SessionThird Session

Page 42: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 42

What is Link SpamWhat is Link Spam

• Since link analysis has played an important role in search engines, it has large commercial values

• Improving one’s PageRank, can directly increase one’s clicks thus earn more money.

• Link Spam is something trying to unfairly gain a high ranking on a search engine for a web page without improving the user experience, by mean of tricky modification / manipulation of the link graph.

Page 43: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 43

Link Spamming TechnologiesLink Spamming Technologies• Adding outlinks

� Replicate hub pages• Adding inlinks

� Create a honey pot� Infiltrate a web directory� Post links on blog, wiki, etc� Participate in-link exchange� Buy expired domains� Create own spam farm.

Page 44: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 44

Case Study: Spam HITSCase Study: Spam HITS

• Hub score can be increased by adding outlinks to the target page

• Authority score can be increased by creating hyperlinks from high-hub-score pages to the target page.

Page 45: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 45

Case Study: Spam PageRankCase Study: Spam PageRank• Factors that influence PageRank

� PR(t)=PRstatic(t)+PRin(t)-PRout(t)-PRsink(t)• Strategies

� Own pages are part of the spam farm, maximizing PRstatic

� Accessible pages point to the spam farm, maximizing PRin

� Links pointing outside the spam farm are supressed, minimizing PRout(t)

� All pages within the farm have some outlinks, minimizing PRsink(t)

Page 46: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 46

Anti-SpamAnti-Spam

• Early approaches�BHITS, SALSA, DOM, revised HITS, BadRank …

• State-of-the-art�TrustRank (2004)�Revised PageRank (VLDB2004)�BadRank + (WWW2005)�SpamRank (WWW2005, workshop)�……

Page 47: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 47

TrustRankTrustRank• Basic assumption

�Good pages seldom point to spam pages, but spam pages may very likely point to good pages.

• Use TrustRank to denote the goodness of a webpage, and use Trust Propagation to label all the web pages starting from a small human-labeled seed set.

Page 48: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 48

TrustRankTrustRank• Step 1: Initialization

�How to select seeds• Inverse PageRank (Hub pages, since they have more in

fluence)• High PageRank (Important pages are more important

to search applications) • Step 2: Propagation

Page 49: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 49

TrustRankTrustRank• Step 3:

�Trust Dampening

�Trust Splitting

Page 50: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 50

BadRank+BadRank+• Motivation

�Pages in the spam farm are densely connected, and many common pages exist in both the inlinks and outlinks of these pages.

• Propagate the badness of pages in the seed set to detect other the spam pages in the Web.

Page 51: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 51

BadRank+BadRank+• Step 1: Initialization

�At least 3 common nodes (approximately the same, i.e. with the same domain name) in the inlink and outlink sets

• Step 2: Expansion�ParentPenalty: if a page links to many bad pages (larger than a threshold), it will also be labeled as bad.�Delete all the links between detected bad pages before PageRank calculation.

Page 52: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 52

Revised PageRankRevised PageRank• Assumption

�The spam farm have high correlation with each other.

• Approach�Increase the probability of jumping from nodes with large correlation coefficients.

Page 53: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 53

Revised PageRankRevised PageRank• Step 1: Collusion detection

�Calculate PageRank values for different ε�Calculate the correlation coefficient between the curve of node x’s PageRank and 1/ ε, denoted by co-co(x).

• Step 2: ε Personalization�Use F(εdefault, co-co(x)) to personalize the original matrix U.�Recalculate PageRank.

Page 54: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 54

SpamRankSpamRank• Key assumption

�Supporters of an honest page should not overly dependent on one another, i.e. they should be spread across different quality.�Due to the self-similarity, the honest supporter set should have a power-law distribution of PageRank.�Spammers have a limited budget, so they do not replicate the unimportant structures.

Page 55: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 55

SummarySummary

• The current works on anti-spam are very limited.

• Promising research directions�Use more statistics and the properties of the transition probability matrix to detect spam�Design a new spam-free ranking function

Page 56: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

HomeworkHomework

Page 57: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 57

Technical Report WritingTechnical Report Writing1. HITS and PageRank are both based on simple linear algebra,

can you design some other link analysis algorithm based on advanced linear algebra or matrix factorization?

2. The performance / sensitivity of PageRank with respect to the smoothing factor ε.

3. How to speed up the calculation of PageRank using matrix factorization, or some specific characteristics of the Markov chain?

4. PageRank is the eigenvector of a 2-D matrix, then can LinkFusion be the eigenvector of a 3-D tensor?

5. Stability analysis for other link analysis algorithms.6. A survey on the state-of-the-art spam technologies.7. How to design a search engine that is robust to spam?8. Other novel research topics related to link analysis.

Page 58: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 58

RequirementsRequirements

• Send the report to [email protected] before Dec 4 (within 1 month).

• The length should not be less than 8 pages, with the template at http://www.acm.org/sigs/pubs/proceed/template.html

• There must be something new and intersting in your report, and you’s better use some experiments to support your idea.

• Never try to copy or steal already-published ideas as your technical report. We are sure we have read much more than you can find.

Page 59: Link Analysis and Anti-Spam Tie-Yan Liu Microsoft Research Asia.

2005-11-4 "Web Search and Mining" Course @ USTC, 2005 59

Other InformationOther Information

• Slides can be found at http://research.microsoft.com/users/tyliu/