Summarization and opinion detection in product reviews

26
Summariza(on and Opinion Detec(on In Product Reviews Team : Suman Papanaboina ([email protected] ) Swapnil Pa7l ([email protected] ) Shubham Srivastava ([email protected] ) Spandana Otra ([email protected] ) Project Mentor: Aditya Joshi ([email protected] )

description

 

Transcript of Summarization and opinion detection in product reviews

Page 1: Summarization and opinion detection in product reviews

Summariza(on  and  Opinion  Detec(on  In  Product  Reviews  

Team  :    

Suman  Papanaboina  ([email protected])  Swapnil  Pa7l  ([email protected])  

Shubham  Srivastava  ([email protected])  Spandana  Otra  ([email protected])  

 Project  Mentor:  

   Aditya  Joshi  ([email protected])          

Page 2: Summarization and opinion detection in product reviews

Project  Mo7va7on  

•  As  e-­‐commerce  is  becoming  more  and  more  popular,  the  number  of  customer  reviews  that  a  product  receives  grows  rapidly.  

•   For  a  popular  product,  the  number  of  reviews  can  be  in  hundreds  or  even  

 

Page 3: Summarization and opinion detection in product reviews

Project  Mo7va7on  This  makes  it  difficult  for  a  poten7al  customer  to  read  them  to  make  an  informed  decision  on  whether  to  purchase  the  product.          It  also  makes  it  difficult  for  the  manufacturer  of  the  product  to  keep    track  and  to  manage  customer    opinions  .  

Page 4: Summarization and opinion detection in product reviews

Project  Objec7ve  

•  Providing  Structured  feature  based  summary  for  the  new  customer  by  mining  reviews.  

 

Page 5: Summarization and opinion detection in product reviews

How  it  is  different  from  Tradi7onal  Summariza7on?  

•  We  only  mine  the  features  of  the  product  on  which  the  customers  have  expressed  their  opinions  and  whether  the  opinions  are  posi7ve  or  nega7ve.  

 •   We  do  not  summarize  the  reviews  by  selec7ng  a  subset  or  rewrite  some  of  the  original  sentences  from  the  reviews  to  capture  the  main  points  as  in  the  classic  text  summariza7on.  

 

Page 6: Summarization and opinion detection in product reviews

End-­‐to-­‐End  Architecture  Crawler  

UI  

Rest  Service  

Sentence  SpliTer/Preprocesser  

Feature/Opinion  Extractor  

Frequent  Feature  Iden7fier  

Feature  Pruner  

Sen7ment  Analyzer  Persistence  

Summarizer  

MySQl  

Page 7: Summarization and opinion detection in product reviews

Crawler  Module  

 

Flipkart  

Jsoup  Scraping  Tool   Persister  

MySQL  

Crawled  below  informa7on  Product  Name  Ra7ng  Review  Comment  Commented  User  Commented  Date/Time  

Page 8: Summarization and opinion detection in product reviews

Sentence  SpliTer/Preprocessor  

 

Review  

Sentence  SpliTer  

OpenNLP  

MySQL  Persister  

Sentence  

Preprocessor  

Stop  words  filter  

Stemming  

Page 9: Summarization and opinion detection in product reviews

Feature/Opinion  Extractor  Module  

 

Sentence  

Stanford  Dependency  

Parser  

Extract  nusbj,  amod,  nn  

Find  any  nega7ons   Persister  

MySQL  

Page 10: Summarization and opinion detection in product reviews

Feature/Opinion  Extractor  Module  

•  Used  stanford  dependency    parser    •  Extract  only  nsubj,  amod,  nn  pairs.  These  pairs  turns  out  to  be  the  required  feature/opinion  pairs.  

   •  Iden7fy  any  nega7ons  expressed  and  adjust  the  opinion  accordingly.  

 

Page 11: Summarization and opinion detection in product reviews

Frequent  Feature  Iden7fica7on  

•  We  defined  frequent  feature  as  a  feature  which  appears  in  more  than  3  sentences  (this  parameter  can  be  configured).  

•  We  used  Apache  Mahout  library  to  find  frequent  paTerns.  

 

 

Page 12: Summarization and opinion detection in product reviews

Frequent  Feature  Iden7fica7on  

 

Features  

Mahout  Frequent  PaTern  Miner  

Sentences  

FP-­‐Grwoth/Fp-­‐tree  

Frequent  Features   Persister  

MySQL  

Page 13: Summarization and opinion detection in product reviews

Redundancy  Pruning  

•  We  defined  a  feature  X  as  redundant  feature  if    •  X  is  a  part  of  another  feature  •  And  the  feature  X  does  not  appear  on  its  own  at  least  in  3  sentences  (threshold  is  configurable,  currently  in  our  system  we  configured  it  as  3)  

•  A_er  implemen7ng  this  technique  we  are  able  to  eliminate  redundant  features  like  baTery,  life,  baTery  life.  

 

Page 14: Summarization and opinion detection in product reviews

Redundancy  Pruning  

Redundancy  Pruner  

BaTery,  life,  baTer  life  

BaTery  life  

Page 15: Summarization and opinion detection in product reviews

Junk  Features  

•  Some  of  the  reviews  we  have  sentences  like  Flipkart  services  are  awesome  in  this  case  our  system  is  extrac7ng  service  as    feature  and  awesome  as  opinion.  

       

 

Frequent  Features  

Junk  Feature  Pruner  

Junk  Feature  File  

Output  Featues  

Page 16: Summarization and opinion detection in product reviews

Sen7ment  Analysis  

Opinion  Words  

Sen7ment  Analyzer   Sen7Wordnet  

Posi7ve  Seed  List   Nega7ve  Seed  List  

Page 17: Summarization and opinion detection in product reviews

Summarizer  

•  Summarizer  generated  feature  based  structured  summary  as  shown  below.  

Page 18: Summarization and opinion detection in product reviews

Feature  Summary  Rest  Service  

•  We  implemented  Rest  service  to  provide  following  func7onali7es  to  the  UI.  

– Find  List  of  categories  in  the  system  – Find  list  of  products  for  a  given  category  – Find  feature  based  summary  for  a  given  product  

•  We  used  Grizzly  embedded  container  to  implement  rest  service.  

Page 19: Summarization and opinion detection in product reviews

UI  

Page 20: Summarization and opinion detection in product reviews

Screen  Shots/Home  Page  

Page 21: Summarization and opinion detection in product reviews

Screen  Shots/Feature  based  summary  

Page 22: Summarization and opinion detection in product reviews

Screenshots/Individual  sentences  

Page 23: Summarization and opinion detection in product reviews

Screenshots/Complete  review  

Page 24: Summarization and opinion detection in product reviews

Evalua7on  

No.  of  feature-­‐opinion  pairs  manual  extracted   20  

No.  of  ini7al  feature-­‐opinion  pairs  extracted  by  our  system  

40  

A_er  frequent  paTern  mining   25  

A_er  pruning  (final  stage)   18  

No.  of  correct  feature-­‐opinion  pairs   15  

No.  of  incorrect  feature-­‐opinion  pairs   3  

Precision   15/20  (75%)  

Recall   18/20  (90%)  

F1-­‐Measure  (  2*precision*recall)/(precision+recall)        0.81    

Page 25: Summarization and opinion detection in product reviews

Conclusion  

•  It  is  a  great  learning  experience  for  all  of  us.  we  are  really  excited  in  applying  data  mining  and  natural  processing  techniques  to  implement  the  system.    

•  We  do  believe  that  this  system  can  help  users  to  quickly  iden7fy  what  is  good/bad  in  a  product  basing  on  other  user  comments.  It  also  provides  a  beTer  perspec7ve  of  user’s  comments  to  the  Manufacturers  which  can  aid  in  proving  business  intelligence.  

Page 26: Summarization and opinion detection in product reviews

Future  Enhancements  •  We  need  to  add  more  rules  to  improve  overall  accuracy  of  

the  feature/opinion  iden7fica7on.    •  Migrate  en7re  system  to  run  on  Hadoop  YARN  using  Hbase  

instead  of  Mysql.    •  Try  unsupervised/supervised  machine  learning  approaches  

for  feature/opinion  iden7fica7ons.    •  Replace  our  home  grown  Crawler  with  more  robust  and  

opensource  crawler  Apache  Nutch  (hTps://nutch.apache.org/)