SMS-Based web Search for Low-end Mobile Devices
description
Transcript of SMS-Based web Search for Low-end Mobile Devices
![Page 1: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/1.jpg)
SMS-Based web Search for Low-end Mobile Devices
Jay Chen
New York University
Lakshmi Subramanian
New York University
Eric Brewer
University of California,
Berkely
![Page 2: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/2.jpg)
SMS-based web service is a rapidly growing market Over 12 million subscribers in July 2008 A significant fraction of mobile devices in developing regions are still low-cost devices
2
Motivation(1)
Undesirable performance about current existing SMS-based web service Low accuracy (Google SMS 22.2%, Yahoo! One search
27.8%[vertical and pre-defined topics]) Long median response time (ChaCha 227.5 seconds
[hire human to search the web and answer questions])
![Page 3: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/3.jpg)
3
Motivation(2)
![Page 4: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/4.jpg)
SMS search suffer from the long tail phenomenon 21% of the queries are verticals and 79% are long tailed (in
ChaCha) None of the existing automated SMS search services is a
complete solution for search queries across arbitrary topics
The search queries are inherently ambiguous
4
Challenges
![Page 5: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/5.jpg)
5
Seek to build an automated system has performance Fast (unlike ChaCha) Accurate(unlike Google SMS and Yahoo! One search) Return a disambiguated result for queries across arbitrary
topics
5
Problem
![Page 6: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/6.jpg)
6
Related work
Mobile search is different from conventional desktop search Click-through rate and search page views were
significantly lower Persistence of mobile users was very low Diversity of search topics for low-end phone users was
much less
Distinct at least one of the three dimensions fromTREC tracks The nature of the input query The document collection set The nature of the search result in the query response
![Page 7: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/7.jpg)
7
System architecture
Run algorithm and return a snippet
![Page 8: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/8.jpg)
8
Vertical: topics are pre-defined or popular
Long tail: topics are not popular
A snippet: any continuous stream of text that fits within an SMS message(within 140 bytes)
Hint: a term or a collection of consecutive terms that determine what kind of information the user is looking for
Introduce of definition
![Page 9: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/9.jpg)
9
SMSFind algorithm
The SMSFind search problem can be characterized as :
★ Given an unstructured SMS search query in the form of <query, hint> and top-k return pages by a search engine, extract a condensed set of text snippets from the response pages that provide an appropriate search response to the query.
This problem definition assumes that the hint is specified for every query. Like Google SMS have a similar explicit requirement, where a keyword is specified as the last term.(this paper’s hint is arbitrary)
![Page 10: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/10.jpg)
10
SMSFind algorithm
Neighborhood Extraction
N-gram Ranking
Snippet Ranking
Considering a search query (Q,H) where Q is the search query containing the hint term H.
Let P1, . . . PN represent the textual content of the top N search response pages to Q. Given(Q,H) and P1 . . . PN, the SMSFind snippet extraction algorithm contains three main steps:
![Page 11: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/11.jpg)
11
Process of SMSFind
Filtering n-grams
Neighborhood extraction
Ranking n-grams
Split snippets tiles
Snippet ranking
Generate n-gramsFilter the set of n-gram based on three dimensions: frequency (3), mean rank(ignore low PageRank n-gram) and Minimum distance(10) .
Rank(s)=freq(s)+meanranks(s)+mindist(s)
Based on the cumulative rank of top-k(5) ranked n-grams within the snippet
Using a 140bytes slide window
![Page 12: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/12.jpg)
12
Generate n-gram
n-gram :1-5 words
N-gram Frequency Min. Distance
"the" 2 1
"the brown" 1 3
"the brown cow" 1 2
"brow cow jumped" 1 1
Table 1: Slicing example for the text “the brown cow jumped over the moon”. Hint=“over”
![Page 13: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/13.jpg)
13
N-gram Ranking Three metrics:
Frequency: the number of times the n-gram occurs across all snippets
Mean rank: the sum across every occurrence of a n-gram of the PageRank of the page in which it occurs, divided by the n-gram’s raw frequency.
Minimum distance : the minimum distance between a n-gram and the hint across any occurrences of both.
![Page 14: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/14.jpg)
14
An example at this point of metrics to evaluate the rank of n-gram
If two n-grams s,t have the same frequency measure but if n-gram s has a much lower web frequency than t, then s needs to be higher ranked than t
TF-IDF
Rank(s)=freq(s)+meanrank(s)+mindist(s) {a linear combination of three normalized ranks}
![Page 15: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/15.jpg)
15
snippet Ranking
![Page 16: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/16.jpg)
16
How to extract a hint
Resource date analysis:
95% of 100, 000 queries from ChaCha are less than 14 terms or less
Several common structures can be observed and have corresponding transformation rules
Like:45% of the queries began with “what”, of which over 80% of
the queries are in standard forms (e.g. “what is”, “what was”, “what are”, “what do”, “what does”)
e.g.
“what is a quote by Ernest Hemingway”Satisfy structure of “what is X”, ignore the stop word “a”, the final<query, hint> is <“ernest hemingway”, quote>
![Page 17: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/17.jpg)
17
Implement
Implement: Language: 600 lines of python uses publicly parsing
Library Deployment: a front-end to send and receive SMS
message Set up: a SMS short code with a local telco in Kenya,
and route all SMS requests and response to and from our server machine
Implement interfaces : to several basic vertical as a part of service including: weather, definitions, local business results, and news. (each of those interfaces under 150 lines python code)
![Page 18: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/18.jpg)
18
Evaluation
![Page 19: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/19.jpg)
19
Use the sub-topic in ChaCha to focus on long tail topics
![Page 20: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/20.jpg)
20
variety of the topics
![Page 21: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/21.jpg)
21
Important to use n-gram to rank the snippet
Critical to return a snippet rather than n-gram
Significant to modify the queries
![Page 22: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/22.jpg)
22
The readability of our snippets is poor
![Page 23: SMS-Based web Search for Low-end Mobile Devices](https://reader035.fdocuments.us/reader035/viewer/2022062517/56813c8a550346895da62c5b/html5/thumbnails/23.jpg)
23
Conclusion
A combination of simple Information Retrieval algorithms in conjunction with existing search engines can provide reasonably accurate search response for SMS queries
Using queries across arbitrary topics show SMSFind can answer 57.3% of the queries in test set.
Represent a foray into an open and practical research domain