Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of...
-
Upload
arlene-marshall -
Category
Documents
-
view
220 -
download
0
description
Transcript of Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of...
![Page 1: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.](https://reader035.fdocuments.us/reader035/viewer/2022062317/5a4d1aea7f8b9ab05997a532/html5/thumbnails/1.jpg)
Aruna Balasubramanian, Yun Zhou, W Bruce Croft,Brian N Levine and Arun Venkataramani
Department of Computer Science, University of Massachusetts, Amherst
Web Search From a Bus
![Page 2: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.](https://reader035.fdocuments.us/reader035/viewer/2022062317/5a4d1aea7f8b9ab05997a532/html5/thumbnails/2.jpg)
Why web search from a bus?
Open access point commonly available
Intermittent internet connectivity from vehicles possible• no subscription cost• useful when no other connectivity is available
Web search 2nd most common web activity (survey by pewinternet.org)
![Page 3: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.](https://reader035.fdocuments.us/reader035/viewer/2022062317/5a4d1aea7f8b9ab05997a532/html5/thumbnails/3.jpg)
Connectivity characteristics of testbeds
Goal: Build web search in the presence of frequent disconnections and small connectivity duration
![Page 4: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.](https://reader035.fdocuments.us/reader035/viewer/2022062317/5a4d1aea7f8b9ab05997a532/html5/thumbnails/4.jpg)
Web search process<your favorite search engine>
Retrievin
g web….
Retrievin
g images…
Retrieving….
![Page 5: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.](https://reader035.fdocuments.us/reader035/viewer/2022062317/5a4d1aea7f8b9ab05997a532/html5/thumbnails/5.jpg)
Adapting to vehicular network
![Page 6: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.](https://reader035.fdocuments.us/reader035/viewer/2022062317/5a4d1aea7f8b9ab05997a532/html5/thumbnails/6.jpg)
Why challenging?
Interactive• several exchanges between user and search engine
needed
Results imprecise• response may not be relevant• difficult to measure relevance
Thedu: Proxy Architecture: sustain interactionIR contribution: increase usefulness of returned response
![Page 7: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.](https://reader035.fdocuments.us/reader035/viewer/2022062317/5a4d1aea7f8b9ab05997a532/html5/thumbnails/7.jpg)
Thedu proxy
Between vehicle and search engine
When proxy receives query request from vehicle• retrieves urls and snippets• prefetches URL contents including images• stores responses and maintains state
When vehicle connects to proxy• downloads pending responses
![Page 8: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.](https://reader035.fdocuments.us/reader035/viewer/2022062317/5a4d1aea7f8b9ab05997a532/html5/thumbnails/8.jpg)
Client and proxy architecture
USER
Web interface
Store query
Process response
Client-side Vehicle Server-side Proxy
Queries for vehicle
Fetch URL/images
Prioritize response
Pending responses
Search engine
Web site
Inte
rmitt
ent
conn
ectiv
ity
New queries
Queries
ResponsebundlesResponses
![Page 9: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.](https://reader035.fdocuments.us/reader035/viewer/2022062317/5a4d1aea7f8b9ab05997a532/html5/thumbnails/9.jpg)
How to prioritize?
Search engines use relevance scores to rank responses• scores not comparable across queries
Even if response is relevant it may not be useful• Query “chants 2007” needs only one response
Thedu• Normalize relevance scores: Comparable across queries• Classify query-type: To capture user intent
http://www.netlab.hut.fi/chants-2007/
![Page 10: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.](https://reader035.fdocuments.us/reader035/viewer/2022062317/5a4d1aea7f8b9ab05997a532/html5/thumbnails/10.jpg)
Query-Type classification
Query-type classification• Homepage query: “cnn”, “chants 2007”• Non-homepage query: “Harry potter review”
Thedu classifies using URL, snippet and title field• E.g., “chants 2007” on Google• <url> http://www.netlab.hut.fi/chants-2007 </url>• <snippet> Welcome to the home page of the ACM MobiCom
workshop on Challenged Networks (CHANTS 2007). </snippet>• <title> chants workshop </title>
Homepage Non HomepageQuery terms occur in URL Query is in question form
All query terms occur in title or snippet
Top URL is wikipedia
Less than 3 words Length greater than 3 words
URL is root
![Page 11: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.](https://reader035.fdocuments.us/reader035/viewer/2022062317/5a4d1aea7f8b9ab05997a532/html5/thumbnails/11.jpg)
Relevance score normalization
Modified language model framework
D: Document, Q: Query, C: Collection
Normalized score
Kullback-Leibler divergence (distance between Q and D)
Probability of word occurring in document
Probability of word occurring in collection
![Page 12: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.](https://reader035.fdocuments.us/reader035/viewer/2022062317/5a4d1aea7f8b9ab05997a532/html5/thumbnails/12.jpg)
Thedu protocol
1. Sort responses in the order of normalized score
2. For response r for query q,
2a. Update
2b. If q is homepage query and do not send
2c. Else send response to vehicle
: expected relevance of all response sent for a query q
: probability that r is relevant for q
![Page 13: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.](https://reader035.fdocuments.us/reader035/viewer/2022062317/5a4d1aea7f8b9ab05997a532/html5/thumbnails/13.jpg)
Evaluation goals
What is the delay in getting search results?
How many results were relevant to the user?
![Page 14: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.](https://reader035.fdocuments.us/reader035/viewer/2022062317/5a4d1aea7f8b9ab05997a532/html5/thumbnails/14.jpg)
Evaluation Tools
DieselNet
Indri search engine
TREC (Text Retrieval Conference)• Predefined web data collection (10G)• Predefined set of queries (100 homepage + 50 content)• Relevance judgments (which documents are relevant for query)
Thedu’s query-type classifier accuracy: 88%
![Page 15: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.](https://reader035.fdocuments.us/reader035/viewer/2022062317/5a4d1aea7f8b9ab05997a532/html5/thumbnails/15.jpg)
Deployment on DieselNet
![Page 16: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.](https://reader035.fdocuments.us/reader035/viewer/2022062317/5a4d1aea7f8b9ab05997a532/html5/thumbnails/16.jpg)
Thedu vs Proxy-less server
Thedu• March 26 to March 30• Bundle responses• Returns responses in
prioritized order• Maintains state
Proxy-less server• April 30 to May 5• Bundle responses• Returns responses as
FIFO• No state
![Page 17: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.](https://reader035.fdocuments.us/reader035/viewer/2022062317/5a4d1aea7f8b9ab05997a532/html5/thumbnails/17.jpg)
Connectivity duration
Mean connection duration: 35 secMean disconnection duration: 8 min
![Page 18: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.](https://reader035.fdocuments.us/reader035/viewer/2022062317/5a4d1aea7f8b9ab05997a532/html5/thumbnails/18.jpg)
Thedu vs Proxy-less architecture
Thedu Stateless proxy
![Page 19: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.](https://reader035.fdocuments.us/reader035/viewer/2022062317/5a4d1aea7f8b9ab05997a532/html5/thumbnails/19.jpg)
Delay until first relevant response
![Page 20: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.](https://reader035.fdocuments.us/reader035/viewer/2022062317/5a4d1aea7f8b9ab05997a532/html5/thumbnails/20.jpg)
Extending Thedu
Can we use connectivity among buses to improve throughput?
Are we limited to academic search engines?• Convince commercial search providers to provide
relevance scores• Or, assign scores based on ranking
Are users really happy with search results and delay?
traces.cs.umass.edu
![Page 21: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.](https://reader035.fdocuments.us/reader035/viewer/2022062317/5a4d1aea7f8b9ab05997a532/html5/thumbnails/21.jpg)
Simulation Results
![Page 22: Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.](https://reader035.fdocuments.us/reader035/viewer/2022062317/5a4d1aea7f8b9ab05997a532/html5/thumbnails/22.jpg)
Inter-meeting times