Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution...

15
Natural Language Processing Lab National Taiwan The splog Detection Task an d A Solution Based on Tempo ral and Link Properties Yu-Ru Lin et al. NEC America TREC 2006 (Blog session) Presentor: Chun-Yuan Teng

description

Natural Language Processing Lab National Taiwan University Uniqueness of splogs Dynamic content –Unlike web spam, a splog generates fresh content to drive traffic Non-endorsement link –Hyperlink is an endorsement of other pages –Spammers can create hyperlinks in normal blogs, links in blogs is not endorsement

Transcript of Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution...

Page 1: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al.

Natural Language Processing LabNational Taiwan University

The splog Detection Task and A Solution Based on Temporal and Link PropertiesYu-Ru Lin et al.

NEC AmericaTREC 2006 (Blog session)

Presentor: Chun-Yuan Teng

Page 2: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al.

Natural Language Processing LabNational Taiwan University

Splog characteristics• Machine-generated content• No Value-addition

– No unique information to their readers• Hidden agenda, usually an economic

goal– Commercial intention

Page 3: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al.

Natural Language Processing LabNational Taiwan University

Uniqueness of splogs• Dynamic content

– Unlike web spam, a splog generates fresh content to drive traffic

• Non-endorsement link– Hyperlink is an endorsement of other pages– Spammers can create hyperlinks in normal bl

ogs, links in blogs is not endorsement

Page 4: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al.

Natural Language Processing LabNational Taiwan University

Features to detect splog• Traditional features

– Tokenized URL, blog and post titles, homepage content, and post content

• Temporal regularity– Temporal content regularity/Temporal

structural regularity• Link regularity

– Consistency in target website

Page 5: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al.

Natural Language Processing LabNational Taiwan University

Temporal Content Regularity

Page 6: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al.

Natural Language Processing LabNational Taiwan University

Temporal Structural Regularity

Page 7: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al.

Natural Language Processing LabNational Taiwan University

Link Regularity estimation

Page 8: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al.

Natural Language Processing LabNational Taiwan University

Two kinds of spam detection

• Offline detection– Traditional measurement

• Online detection– Detect spam online

Page 9: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al.

Natural Language Processing LabNational Taiwan University

Experimental Result (Offline)

Page 10: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al.

Natural Language Processing LabNational Taiwan University

Experimental results (Offline)

Page 11: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al.

Natural Language Processing LabNational Taiwan University

Online indexing in blog search engine

Page 12: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al.

Natural Language Processing LabNational Taiwan University

Online test

Page 13: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al.

Natural Language Processing LabNational Taiwan University

Online test in this paper

Page 14: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al.

Natural Language Processing LabNational Taiwan University

Experimental results

Page 15: Natural Language Processing Lab National Taiwan University The splog Detection Task and A Solution Based on Temporal and Link Properties Yu-Ru Lin et al.

Natural Language Processing LabNational Taiwan University

Conclusion and contributions

• Modeling the splog problem– The uniqueness of splog

• Regularity based detection– Content and post time

• Evaluation– Online evaluation