1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei...
-
Upload
maud-hubbard -
Category
Documents
-
view
213 -
download
0
Transcript of 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei...
![Page 1: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/1.jpg)
1
Artificial Intelligence techniques for Information Retrieval in Web
Presented by
Hamid R. Chinaei
1 October 2007
![Page 2: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/2.jpg)
2
Outline
Information Retrieval Document Content User Behavior Markov Chains The Proposed Models Conclusion
![Page 3: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/3.jpg)
3
IR Architecture
IRSystem
Query String
Documentcorpus
RankedDocuments
1. Doc12. Doc23. Doc3 . .
![Page 4: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/4.jpg)
4
Document Content
Document Content (set of words + their weights)
![Page 5: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/5.jpg)
5
User Behavior
Query submissions
Clicks on documents
Time spent reading the document
Query refinements
![Page 6: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/6.jpg)
6
System Modeling
System
Query1
Query2
Query n
RankedDocuments
DocumentDescriptionClicks +Time
RanksUser
Update
![Page 7: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/7.jpg)
7
Markov Chains[5]
QU q
S dD
dCSqUpaLCSqUa ),,,|(),(),,,|(
),,( RDQ
],...,[
},...,{ 1
ji
n
dda
ddC
1)1,,(
0)0,,(
cRL
cRL
DQ
DQ
),|( DQRp R
),|1(1),|0(0),( dqRpcdqRpcqda
),( aL
![Page 8: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/8.jpg)
8
Markov Chains cont’d
)0|1( dqp
0d 1d)0|0( qdp
0q 1q
)1|1( qdp
)()|(.)|( dpdqpqdp
![Page 9: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/9.jpg)
9
Inference Networks
…
…
D1D0 Dn
Q
w1w0 wm
Document Layer
Concept Layer
Query Layer
C N
R
![Page 10: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/10.jpg)
10
Example
![Page 11: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/11.jpg)
11
Example Cont’d
![Page 12: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/12.jpg)
12
Example Cont’d
![Page 13: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/13.jpg)
13
Example Cont’d
![Page 14: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/14.jpg)
14
POMDPs
Observation: user query, clicked document by user, Time spent on the document
Rewards : time spent on a document States: the concept the user is looking for Action: Ranking the documents
Oo
ta
t
Oo
tAa
t
bVboPabRb
boabtsbVboPabRbV
)()|(*),(maxarg)(
,,..,)()|(*),(max)(
1
1
![Page 15: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/15.jpg)
15
POMDPs cont’d
0Q0Q
0Q
q1
T1
U1
q0
U0
q2
T2
U2
a a
T1
d1
Tn
dn…
UP
0Q2Q1Q
![Page 16: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/16.jpg)
16
Example of a System Belief
![Page 17: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/17.jpg)
17
Conclusion
Using AI techniques eventually users (not the search engine ) rank the documents– improving any ranking algorithm
Resist the effect of search engine on surviving/taking out web pages [2,3]
![Page 18: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/18.jpg)
18
Experiment Setup
Data– AOL User Session Collection [1]
Database – MySQL, 277 MB data, 216 MB index Length– At the moment experiments on 1,500,000 clickthrough (one
tenth of available clickthrough),
Application in Java – So far more than 500 line of code without comments and
test cases
![Page 19: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/19.jpg)
19
Classes
URL Query User (for the purpose of user modeling) Term IR (run class)
![Page 20: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/20.jpg)
20
Class Diagrams
![Page 21: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/21.jpg)
21
Data Schema
aolLogTable
– AnonID 1205043– Query “public records”– QueryTime 2006-04-06 03:19:42.0– URLRank 1– URL http://www.searchsystems.net
![Page 22: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/22.jpg)
22
Example: SearchSytems.net
SearchSystems.net - The Largest Public Records Directory SearchSystems.net is the internet's largest directory of public records databases, Search for all these records public, property, Federal, State,Local,...
www.searchsystems.net/ - 39k - Similar pages <meta name="description" content="SearchSystems.net is the internet's
largest directory of public records databases,Search for all these records public, property, Federal, State, Local, national, vital, Tax, geneaology, court, social security, documents, judgments, probation, laws, civil, suit, court" />
<meta name="keywords" content="records, public, directory, Federal, State, Local, national, vital, Tax, genealogy, court, social security, documents, judgments, probation, laws, civil, suit, court, action, lien, USA, certificates, lawsuits, offenders, court, civil, information" />
![Page 23: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/23.jpg)
23
Example cont’d
Result set for SearchSystems.net
resultSet= Select a.AnonID AS AnonID, a.Query AS Query, a.QueryTime AS QueryTime, a.URLRank AS URLRank, a.URL AS URLfrom aolLogTable a where a.URL=“http://www.searchsystems.net”;
![Page 24: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/24.jpg)
24
Sample Results
AnonID Query QueryTime Rank URL10422043 germany 1850 2006-05-07 13:00:28.0 54 http://www.searchsystems.net10432858 tax liens in gretna 2006-05-28 14:30:04.0 2 http://www.searchsystems.net10434732 search public records 2006-05-22 21:12:41.0 1 http://www.searchsystems.net10559651 free unclaimed propert search 2006-03-28 17:10:35.0 3 http://www.searchsystems.net10825800 free criminal offense search 2006-04-06 23:15:20.0 1 http://www.searchsystems.net10971516 public records 2006-05-09 23:01:09.0 1 http://www.searchsystems.net11199274 mentor ohio criminal records 2006-05-22 19:42:12.0 1 http://www.searchsystems.net11412322 texas public records of birth 2006-04-14 10:51:14.0 6 http://www.searchsystems.net11412322 free inmate locator 2006-04-23 17:39:21.0 17 http://www.searchsystems.net11655138 public court records bakersfield 2006-04-09 15:56:52.0 2 http://www.searchsystems.net11752893 free online public records 2006-05-26 20:32:45.0 1 http://www.searchsystems.net
![Page 25: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/25.jpg)
25
Observation 1
Number of clicks for URLs increases exponentially
www.microsoft.com
www.searchsystems.com
![Page 26: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/26.jpg)
26
Getting Query Chains
resultSet= Select a.AnonID AS AnonID, a.Query AS Query, a.QueryTime AS QueryTime, a.URLRank AS URLRank, a.URL AS URLfrom aollogtable1 a where a.AnonID= _AnonID
and a.QueryTime< _QueryTimeorder by a.QueryTime desc ;
For the purpose of recursive calls for query chains (see next slide)
![Page 27: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/27.jpg)
27
Getting Query Chains cont’d
preURLsRecursive(_QueryTime) {
if (resultSet) {result=resultSet.next;QueryTimePrime = resultSet.getTimestamp();if (_QueryTime - QueryTimePrime < timeThresh) {
preURLsRecursive(QueryTimePrime);return result;
}}
}
![Page 28: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/28.jpg)
28
Sample of Results
User has not clicked any result here
484518 indiana state prison 2006-03-06 13:24:22.0 1 http://www.in.gov484518 morgan county indiana jail 2006-03-06 13:27:38.0 1 http://scican3.scican.net484518 indiana inmate locator 2006-03-06 13:28:54.0 1 http://www.in.gov484518 fugitives of indiana 2006-03-06 13:37:51.0 1 http://www.criminalwatch.com484518 indiana fugitives caught 2006-03-06 13:39:12.0 0484518 west virgina public records wills 2006-03-06 13:40:48.0 0484518 west virgina public records 2006-03-06 13:41:11.0 0484518 west virginia public records 2006-03-06 13:41:18.0 1 http://www.searchsystems.net
![Page 29: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/29.jpg)
29
Observation 2: Term Weights
We used data logs to obtain weight of word w for URL d, R(w,d),
qi s are queries in which word w occur
qj s are all queries for URL d
Rank(qi,d) is the rank of URL d for query qi
m
jj dqRank
dqRankdwqR
1
),(
)),(/1(),,(
n
ii dwqRdwR
1
),,(),(
![Page 30: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/30.jpg)
30
Observation 2 cont’d
Top 40 terms for URL SearchSystems.net
– county, records, court, free, public, florida, cases, michigan, germany, probate, tax, pasco, oregon, nc, indiana, deeds, sheriff, ohio, search, hanover, etowah, criminal, texas, property, warrants, databases
![Page 31: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/31.jpg)
31
Next step
More accurately obtain of word weights for URLs – Use of information in query chains for obtaining
top term of URLs– Use of other methods?
Obtain of document summaries for several URLs and evaluate the results
![Page 32: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/32.jpg)
32
Thanks
![Page 33: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/33.jpg)
33
Discussions and Questions
Can proposed model eventually provide us a fix document content? (Does the method converge?)
Any other technique which might be helpful.
![Page 34: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/34.jpg)
34
References
[1] Jian-Tao Sun, Dou Shen, Hua-Jun Zeng, Qiang Yang, Yuchang Lu, Zheng Chen: Web-page summarization using clickthrough data. SIGIR 2005: 194-201
[2] Alexandros Ntoulas, Junghoo Cho, Christopher Olston: What's new on the web?: the evolution of the web from a search engine perspective. WWW 2004: 1-12
[3] Junghoo Cho, Sourashis Roy: Impact of search engines on page popularity. WWW 2004: 20-29
![Page 35: 1 Artificial Intelligence techniques for Information Retrieval in Web Presented by Hamid R. Chinaei 1 October 2007.](https://reader036.fdocuments.us/reader036/viewer/2022070413/5697bff21a28abf838cbc03f/html5/thumbnails/35.jpg)
35
Reference
[4]G. Pass et al., "A Picture of Search" The First International Conference on Scalable Information Systems, Hong Kong, June, 2006 Copyright (2006) AOL
[5]J. Lafferty, C. Zhai, “Document Language Models, Query Models, and Risk Minimization" SIGIR 2001