Detecting Online Commercial Intention (OCI)
-
Upload
ceya -
Category
Technology
-
view
2.401 -
download
0
description
Transcript of Detecting Online Commercial Intention (OCI)
![Page 1: Detecting Online Commercial Intention (OCI)](https://reader033.fdocuments.us/reader033/viewer/2022052820/547b50d2b4af9fe2158b4e67/html5/thumbnails/1.jpg)
1
Detecting Online Commercial Intention (OCI)
Honghua Dai, Zaiqing Nie, Lee Wang, Lingzhi Zhao, Ji-Rong Wen, Ying Li
WWW’06
Advisor: Chia-Hui ChangStudent: Teng-Kai Fan
Date: 2009-08-24
![Page 2: Detecting Online Commercial Intention (OCI)](https://reader033.fdocuments.us/reader033/viewer/2022052820/547b50d2b4af9fe2158b4e67/html5/thumbnails/2.jpg)
2
Outline
Introduction Defining (OCI): Online Commercial Intention Learning Online Commercial Intention
Web Page OCI Detector Query OCI Detector
Experiment Conclusion
![Page 3: Detecting Online Commercial Intention (OCI)](https://reader033.fdocuments.us/reader033/viewer/2022052820/547b50d2b4af9fe2158b4e67/html5/thumbnails/3.jpg)
3
Introduction
Two major online user activities: Browsing activity Searching activity
Three categories for user’s search intention: Navigational: reach to a particular web site. Informational: acquire information on web pages. Transactional: perform some “web-mediated”
activity.
![Page 4: Detecting Online Commercial Intention (OCI)](https://reader033.fdocuments.us/reader033/viewer/2022052820/547b50d2b4af9fe2158b4e67/html5/thumbnails/4.jpg)
4
Introduction cont.
OCI (Online Commercial Intention): understanding whether a user has intention to purchase or participate in commercial service.
![Page 5: Detecting Online Commercial Intention (OCI)](https://reader033.fdocuments.us/reader033/viewer/2022052820/547b50d2b4af9fe2158b4e67/html5/thumbnails/5.jpg)
5
Defining OCI(Online Commercial Intention) Defining OCI to be a function from a query
or a Web page to a binary value: Commercial or Non-Commercial.
The goal is to compute two functions OCI: Q → {Commercial, Non-Commercial} OCI: P → {Commercial, Non-Commercial}
![Page 6: Detecting Online Commercial Intention (OCI)](https://reader033.fdocuments.us/reader033/viewer/2022052820/547b50d2b4af9fe2158b4e67/html5/thumbnails/6.jpg)
6
Learning Online Commercial Intention Taxonomy-based
Using existing concept hierarchies or categories.
Machine learning approach Extracting features from page content and
building the classifiers based on those features. Labeling Process: Human-evaluation approach.
![Page 7: Detecting Online Commercial Intention (OCI)](https://reader033.fdocuments.us/reader033/viewer/2022052820/547b50d2b4af9fe2158b4e67/html5/thumbnails/7.jpg)
7
Web Page OCI Detector Input: a Web Page P Output: OCI (commercial or non-commercial) of P
SVM
![Page 8: Detecting Online Commercial Intention (OCI)](https://reader033.fdocuments.us/reader033/viewer/2022052820/547b50d2b4af9fe2158b4e67/html5/thumbnails/8.jpg)
8
Keyword Extraction and Selection Keyword extraction: both inner text and tag at
tributes of all the training data.
Feature selection:
Pr(k|C): the probability of the keyword k occurring in a Web page belonging to
class C.
12)|Pr()|Pr(
)}|Pr(),|{Pr()(
CkCk
CkCkMaxkSig
)|Pr()( CCkkFreq
![Page 9: Detecting Online Commercial Intention (OCI)](https://reader033.fdocuments.us/reader033/viewer/2022052820/547b50d2b4af9fe2158b4e67/html5/thumbnails/9.jpg)
9
Keyword Extraction and Selection cont. Define two aspects of properties for each
keyword k in a page p:
For a page p with n keywords can be represented in 2*n dimensions:
p
ppknit
pagein elementsfor number total
in inner text itsin appeared keyword that theelements of#),(
p
ppknta
pagein elementsfor number total
in attributes tagitsin appeared keyword that theelements of#),(
![Page 10: Detecting Online Commercial Intention (OCI)](https://reader033.fdocuments.us/reader033/viewer/2022052820/547b50d2b4af9fe2158b4e67/html5/thumbnails/10.jpg)
10
Query OCI Detector
Four type of data sources for query OCI: Constituent terms of search query.
Ex.: “airline ticket deals”, “digital camera price”.
Content of top landing pages recommended by search engine.
Content of search result page. Including title, short descriptions, and URL links.
The number of user clicks of landing pages recommended by search engine.
![Page 11: Detecting Online Commercial Intention (OCI)](https://reader033.fdocuments.us/reader033/viewer/2022052820/547b50d2b4af9fe2158b4e67/html5/thumbnails/11.jpg)
11
Detecting OCI based on Top Search Result Landing Pages Using top-10 result pages generated by
MSN.
Using the Web page OCI detector to detect the OCI of top 10 landing pages.
.query ofresult search in the rank has that page Web theis qnpnq
![Page 12: Detecting Online Commercial Intention (OCI)](https://reader033.fdocuments.us/reader033/viewer/2022052820/547b50d2b4af9fe2158b4e67/html5/thumbnails/12.jpg)
12
Detecting OCI based on Top Search Result Landing Pages cont.
![Page 13: Detecting Online Commercial Intention (OCI)](https://reader033.fdocuments.us/reader033/viewer/2022052820/547b50d2b4af9fe2158b4e67/html5/thumbnails/13.jpg)
13
Detecting OCI based on First Search Result Page
![Page 14: Detecting Online Commercial Intention (OCI)](https://reader033.fdocuments.us/reader033/viewer/2022052820/547b50d2b4af9fe2158b4e67/html5/thumbnails/14.jpg)
14
Experiments
Data 1408 US English queries. Collect the first search result page for 1408 queries. Collect the top 10 landing pages for 1408 queries Randomly pick 26186 English Web pages.
Labeling Analysis
![Page 15: Detecting Online Commercial Intention (OCI)](https://reader033.fdocuments.us/reader033/viewer/2022052820/547b50d2b4af9fe2158b4e67/html5/thumbnails/15.jpg)
15
Evaluation Methodology
For Web OCI detector, due to unbalanced problem, they selected all commercial pages and the equals number of non-commercial to train model.
For query OCI detector: Compare the model based on first search result page and top N result
landing pages. Using 3-fold cross validation.
Measures: Precision, Recall and F-Measure
![Page 16: Detecting Online Commercial Intention (OCI)](https://reader033.fdocuments.us/reader033/viewer/2022052820/547b50d2b4af9fe2158b4e67/html5/thumbnails/16.jpg)
16
Evaluating Page OCI Detector
CP (Precision), CR (Recall), CF (F-measure)
![Page 17: Detecting Online Commercial Intention (OCI)](https://reader033.fdocuments.us/reader033/viewer/2022052820/547b50d2b4af9fe2158b4e67/html5/thumbnails/17.jpg)
17
Evaluating Page OCI Detector cont.
![Page 18: Detecting Online Commercial Intention (OCI)](https://reader033.fdocuments.us/reader033/viewer/2022052820/547b50d2b4af9fe2158b4e67/html5/thumbnails/18.jpg)
18
Evaluating Query OCI Detector
![Page 19: Detecting Online Commercial Intention (OCI)](https://reader033.fdocuments.us/reader033/viewer/2022052820/547b50d2b4af9fe2158b4e67/html5/thumbnails/19.jpg)
19
OCI Analysis for a Stratified Query Sample based on Query Frequency Divided query frequency into 5: Single, Very low, Low, Mid, and High. Randomly select 10000 queries for each level.
Observation: Query set with high frequency have larger portion of queries with commercial intention.
![Page 20: Detecting Online Commercial Intention (OCI)](https://reader033.fdocuments.us/reader033/viewer/2022052820/547b50d2b4af9fe2158b4e67/html5/thumbnails/20.jpg)
20
Conclusion
They present the framework of building machine learning models to learn OCI (queries and Web pages) based on any web page content.