Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine...

42
Juice: A Longitudinal Study of an SEO Campaign David Y. Wang, Stefan Savage, and Geoffrey M. Voelker University of California, San Diego 1

Transcript of Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine...

Page 1: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Juice:    A  Longitudinal  Study  of  an  SEO  Campaign  

David  Y.  Wang,  Stefan  Savage,  and  Geoffrey  M.  Voelker  

University  of  California,  San  Diego  

1  

Page 2: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Background  

•  A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive means – Supported by botnet of compromised Web Sites – Poison search results – Feed traffic to scams (e.g. Fake Anti-Virus)

•  Link Juice refers to the backlinks (references) a site receives – Believed to influence search result ranking

2  

Page 3: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Doorway!

Attacker"

3  

We  begin  with  an  aLacker  +  a  targeted  Website  

Page 4: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Doorway!

(1)"

Attacker"

4  

The  aLacker  compromises  the    Website  using  an  open  vulnerability  +  installs  an  SEO  kit  

Page 5: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Doorway!

(1)"

(2)"

GET !/volcano.html!

Search Engine"Web Crawler"

Attacker"

5  

When  a  Web  crawler  tries  to  fetch  a  page…  

Page 6: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Doorway!

(1)"

(2)"

GET !/volcano.html!

Search Engine"Web Crawler"

Attacker"

6  

The  crawler  receives  a  page    intended  to  rank  well  

Page 7: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Doorway!

(1)"

(2)"

(3)"

GET !/volcano.html!

Search Engine"Web Crawler"

Attacker"

7  

The  page  gets  indexed  by  Google  

Page 8: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Doorway!

(1)"

(2)"

(3)"

GET !/volcano.html!

(4)"

User"

Search Engine"Web Crawler"

Attacker"

“volcano”!

8  

When  a  user  searches  in  Google  +  clicks  on  the  compromised    page…  

Page 9: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Doorway!

(1)"

(2)"

(3)"

GET !/volcano.html!

(4)"

(5)"

User"

Search Engine"Web Crawler"

Attacker"

Scams"

“volcano”!

9  

He  is  redirected  to  a    scam  of  the  aLacker’s  choosing…  

Page 10: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Our  ContribuYons  

•  Infiltrate  an  influenYal  SEO  botnet  (GR)  –  In  depth  characterizaYon  of  GR’s  operaYon  

•  One  Yme  leader  in  poisoned  search  results  on  Google  

– Our  work  builds  on  previous  work  studying  search  result  poisoning  [John11,  Lu11,  Moore11]  

•  Draw  insights  from  combining  data  from  three  separate  data  sources  (crawlers):  – EsYmate  GR’s  effecYveness  – Examine  impact  of  scams  funding  GR  

10  

Page 11: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

SEO  Kit  

•  An  SEO  kit  is  soeware  installed  on  compromised  sites  – Allows  backdoor  access  for  botmaster  – Performs  Black  Hat  SEO  (i.e.  cloaking,  content  generaYon,  user  redirecYon)  

– Typically  they  are  obfuscated  code  snippets  injected  into  pages  

<?php  if(!funcYon_exists('cm4y2wui5w153'))  {  

 funcYon  cm4y2wui5w153($smcx)  {$dix5xk='x);';…}  

?>  

<?php  //  Общее  define("GR_CACHE_ID",  "v8_cache");  define("GR_SCRIPT_VERSION",  "v8.0  (28.02.2012)");  ?>   11  

Page 12: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Anecdote  

•  Obtained  a  copy  of  the  GR  SEO  kit  by  contacYng  owners  of  compromised  sites  – Roughly  40  a9empts  – A  handful  were  willing  to  help  – But,  only  1  person  was  able  to  disinfect  their  site  and  send  us  the  kit  

•  The  SEO  kit  allows  us  to  infiltrate  the  botnet  and  understand  how  the  campaign  works  

12  

Page 13: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

GR  Botnet  Architecture  

•  The  GR  Botnet  is  built  using  pull  mechanisms  and  is  comprised  of  3  types  of  hosts:  – Compromised  Web  Sites  act  as  doorways  for  visitors  and  control  which  content  is  returned  

– The  Directory  Server’s  only  role  is  to  return  the  loca<on  of  the  C&C  Server  

– The  C&C  Server  acts  as  a  centralized  content  server  for  the  GR  Botmaster  

13  

Page 14: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Compromised Web Sites"

Directory Server"

C&C"Server"

User  requests  a  page  from  a  compromised  site  

Example  of  User  Visit  

HTTP GET index.html!

14  

Page 15: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Compromised Web Sites"

Directory Server"

C&C"Server"

Example  of  User  Visit  

Compromised  site  tries  to  look  up  locaYon  of  C&C   Where  is  the  

C&C?  

15  

Page 16: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Compromised Web Sites"

Directory Server"

C&C"Server"

Example  of  User  Visit  

Compromised  site  looks  up  locaYon  of  C&C  Server   The  C&C  is  @  

1.2.3.4  

16  

Page 17: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Compromised Web Sites"

Directory Server"

C&C"Server"

Example  of  User  Visit  Compromised  site  fetches  content  to  return  to  user  from  C&C  Server  

What  should  I  return  to  the  

user?  

17  

Page 18: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Compromised Web Sites"

Directory Server"

C&C"Server"

Example  of  User  Visit  Compromised  site  fetches  content  to  return  to  user  from  C&C  Server  

Here  are  some  scams  for  the  

user  

18  

Page 19: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Compromised Web Sites"

Directory Server"

C&C"Server"

Example  of  User  Visit  

User  is  redirected  to  scams  

19  

Page 20: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Data  CollecYon  

•  We  collect  data  using  3  disYnct  crawlers  – Odwalla  crawls  and  monitors  compromised  sites  in  the  GR  botnet  (October  2011  –  June  2012)  

– Dagger  measures  poisoned  search  results  for  trending  searches  (April  2011  –  August  2011)  

– Trajectory  crawls  pages  using  a  Web  browser  to  follow  redirects  (April  2011  –  August  2011)  

•  Although  Ymeframes  do  not  overlap  cleanly,  we  can  sYll  draw  insights  

20  

Page 21: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Odwalla  

•  Odwalla  crawls  GR’s  topology  •  Begin  w/  poisoned  search  results  [Dagger]  •  Takes  advantage  of  two  characterisYcs  of  the  compromised  sites  in  GR:  – Sites  respond  to  the  C&C  protocol  by  returning  diagnosYc  informaYon  (easy  confirmaYon)  

– Sites  are  cross  linked  with  other  compromised  sites  in  order  to  manipulate  search  rankings  (find  more  compromised  sites)  

21  

Page 22: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Results  

•  What  are  the  characterisYcs  of  GR?  – Size,  Churn,  LifeYme  

•  How  effecYve  is  GR  in  poisoning  Google?      – We  focus  on  how  many  poisoned  search  results  are  exposed  to  the  user  

•  Longitudinal  data  allows  us  to  idenYfy  long  term  trends  – MoneYzaYon  through  scams  

22  

Page 23: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

GR  Size  +  Churn  

23  

•  GR  is  modest  in  size  •  There  is  liDle  churn  amongst  nodes  

0200

600

1000

# C

om

pro

mis

ed W

eb S

ites

Nov 11 Jan 12 Mar 12 May 12 Jul 12

summacoemv7v8

Page 24: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

•  We  define  lifeEme  as  the  <me  between  the  first  and  last  <me  Odwalla  observed  the  SEO  kit  running  on  a  site  

•  A  site  is  saniEzed  when  it  no  longer  responds  to  the  C&C  protocol  for  8  consecu<ve  days  

GR  LifeYme  

24  

Page 25: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

•  Compromised  sites  are  long  lived  (months  at  a  Yme)  and  able  to  support  GR  w/  high  availability    

•  SEO  kits  want  to  hide  their  presence  from  site  owners  

GR  LifeYme  

25  

< 1 1!2 2!3 3!4 4!5 5!6 6!7 7!8 > 8 *

# Months

# S

anitiz

ed S

ites

0

100

200

300

400

500

600

Page 26: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

EffecYveness  

•  Measure  effecEveness  of  GR  by  the  volume  of  poisoned  search  results  

•  Intersect  known  compromised  sites  [Odwalla]  with  poisoned  search  results  on  Google  [Dagger]  

•  Label  each  poisoned  search  result  as:  – AcEve:    cloaking  +  redirecYng  users    – Tagged:    neutralized  via  Google  Safe  Browsing  – Dormant:    cloaking,  but  not  redirecYng  users  

26  

Page 27: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

EffecYveness  

•  MulYple  periods  of  acYvity:  Start    Surge    Steady    Idle  

27  

010

00

3000

5000

# P

ois

oned S

earc

h R

esu

lts

Apr 11 May 11 Jun 11 Jul 11 Aug 11 Sep 11 Oct 11

totalactivedormanttagged

Page 28: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

EffecYveness  

Start    Surge    Steady    Idle  

 Mostly  tagged,  ac<ve  ramping  up  

28  

010

00

3000

5000

# P

ois

oned S

earc

h R

esu

lts

Apr 11 May 11 Jun 11 Jul 11 Aug 11 Sep 11 Oct 11

totalactivedormanttagged

Page 29: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

EffecYveness  

Start    Surge    Steady    Idle  

 Ac<ve  surges  with  li9le  pressure  from  GSB  

29  

010

00

3000

5000

# P

ois

oned S

earc

h R

esu

lts

Apr 11 May 11 Jun 11 Jul 11 Aug 11 Sep 11 Oct 11

totalactivedormanttagged

Page 30: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

EffecYveness  

Start    Surge    Steady    Idle  

 Tagged  increases,  but  many  ac<ve  s<ll  present  

30  

010

00

3000

5000

# P

ois

oned S

earc

h R

esu

lts

Apr 11 May 11 Jun 11 Jul 11 Aug 11 Sep 11 Oct 11

totalactivedormanttagged

Page 31: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

EffecYveness  

Start    Surge    Steady    Idle  

 Total  volume  drops,  lack  of  mone<za<on  

31  

010

00

3000

5000

# P

ois

oned S

earc

h R

esu

lts

Apr 11 May 11 Jun 11 Jul 11 Aug 11 Sep 11 Oct 11

totalactivedormanttagged

Page 32: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Market  Share  

•  Compare  GR  against  all  poisoned  search  results  •  GR  accounts  for  the  majority  of  poisoned  search  results  during  the  surge  period  (58%)  

32  

010

00

3000

5000

# P

ois

oned S

earc

h R

esu

lts

Apr 11 May 11 Jun 11 Jul 11 Aug 11 Sep 11 Oct 11

AllGR

Page 33: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

MoneYzaYon  

•  To  idenYfy  final  scam  from  redirecYon  data  [Trajectory],  we  select  chains:  – Originate  from  GR  doorway  – Contain  1+  cross  site  redirect  – Occur  while  mimicking  MSIE  

•  Manually  cluster  +  classify  scams  

33  

Page 34: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

MoneYzaYon  

•  ExperimentaYon  w/  affiliate  programs  •  Early  on  Fake  AV  is  the  scam  of  choice  

34  

Apr 11 Jun 11 Aug 11 Oct 11 Dec 11

% R

edirect

Chain

s

020

40

60

80

100

fakeavpharma

oemmov

ppcerror

miscdriveby

Page 35: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

MoneYzaYon  

•  FBI  crackdown  on  Fake  AV  industry  sent  GR  into  flux  

35  

Apr 11 Jun 11 Aug 11 Oct 11 Dec 11

% R

edirect

Chain

s

020

40

60

80

100

fakeavpharma

oemmov

ppcerror

miscdriveby

Page 36: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Conclusion  

•  GR  is  very  effecYve  at  poisoning  search  results  even  with  modest  resources  

•  Fake  AV  was  the  financial  mo<va<on  that  drove  innovaYon  in  GR  (the  killer  scam)  

•  Pure  technical  intervenYons  had  some  effect,  but  it  was  the  financial  interven<on  that  forced  GR  into  re<rement  

36  

Page 37: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Thank  You!  

•  QuesYons?  

37  

Page 38: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Odwalla  Example  

38  

Site_0"

Site_1"

Site_2"

Super  Bowl  

Beyonce  

Super  Bowl  

Odwalla  wants  to  test  whether  Site_0  is  part  of  GR  

Page 39: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Odwalla  Example  

39  

Site_0"

Site_1"

Site_2"

Super  Bowl  

Beyonce  

Super  Bowl  

Odwalla  uses  C&C  protocol  to  iniYate  handshake  w/  Site_0  

Page 40: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Odwalla  Example  

40  

Version: v MAC 1 (28.10.2011)!Cache ID: v7mac_cache!Host ID: example.com!

Site_0"

Site_1"

Site_2"

Super  Bowl  

Beyonce  

Super  Bowl  

Site_0  responds  w/  diagnosYc  info,  confirming  membership  in  GR  

Page 41: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Odwalla  Example  

41  

Site_0"

Site_1"

Site_2"

Super  Bowl  

Beyonce  

Super  Bowl  

In  addiYon  we  discover  Site_0  juicing  Site_1  and  Site_2  

Page 42: Juice:!!A!Longitudinal!Study!of!an! SEO!Campaign! · Background! • A Black Hat Search Engine Optimization (SEO) campaign is a coordinated effort to obtain user traffic through abusive

Odwalla  Example  

42  

Site_0"

Site_1"

Site_2"

Super  Bowl  

Beyonce  

Super  Bowl  

Odwalla  tests  Site_1  and  Site_2