Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
-
Upload
brice-palmer -
Category
Documents
-
view
215 -
download
1
Transcript of Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
TI: AN EFFICIENT INDEXING MECHANISM FOR REAL-TIME
SEARCH ON TWEETSSIGMOD ‘11
C. CHEN ET AL
Pete Bohman
Adam Kunk
What is real-time search?
What do you think as a class?
Real-Time Search
Definition: A search mechanism capable of finding information in an online fashion as it is produced.
Real-Time Search
In terms of real-time search, what does “online” mean?Online means that a constant flow of input
data is handled, contrary to batch processing
Bing Social Search
What kind of content is fed into real-time search?
1. Microblogging data (new type of data)1. Short temporal life span
2. Little to no context
3. Simple ideas, fast reporting of events
4. Metadata: time, location, relationships
5. Less factual, more opinionated
6. Static posts (vs. dynamic web pages)
7. Furious input rate
8. Often no hyperlink structure, few traditional ranking factors
Real-Time Search Input Data Example of what kind of input data is
considered for these real-time search systems:
twittervision
Web search engines index webpages perdiodically.Expand on this from content on Wiki?
Web search is not concerned with social links between data/users
What makes real-time search systems different?
What makes real-time search systems different?
Web search (Google) – primarily concerned with relevance
Real-time search – concerned with relevance, popularity, temporal immediacy
What makes real-time search systems different?
Web search engines crawl the web periodically and update indexesNeed to since data/pages are dynamic
Real-time search engines have a constant stream of input data.No need to poll since the posts are static
What can we do with real-time search engines?
What are the applications of real-time search?
How are people using it?
Different types of queries
Extended Queries
What are the applications of real-time search?
Real-time news reportsExample System: TwitterStandTwitter is one of the first avenues in which
news is reportedExample: Michael Jackson’s death ordering of
media events○ 9-11 call, twitter update, LA times, Mainstream
newsCrowdsourcing of first hand reports, CNN
does real-time search to see what people are reporting
What are the applications of real-time search?
Real-time alert systemsLeverages tweet metadata (time, location) to
raise alertsEarthquake localization based on tweets
Twitter Real-Time Alerts
USGS Twitter Earthquake Detector
Value of real-time search
The estimated value of real-time search is around $33,000,000This is based on queries posted to the
collecta systemValue derived from types of queries entered
in real-time search systems, then valuing these with adwords auctions
Related Work (belong here?) Partial Indexes
View Materialization
Google and Twitter microblog real-time search engines
Difficulties of Real-Time Search
The problem can be split into two factors:Efficient indexing in order to provide for fast
results
Effective ranking in order to return relevant results
Real-Time Search Indexing
How does indexing differ from traditional web indexing?The large volumes of input in microblogging
applications such as Twitter○ Not feasible to index every incoming tweet
immediately
Need to use selective indexing based on results that are most likely to appear
Real-Time Search Indexing RDBMS indexing versus real-time
Real-time indexing is not looking at structured data
Indexes in RDBMS are built on columns, and they are not necessary to return results○ They just speed up the search
Indexes in real-time applications determine which results are candidate to be returned
TI indexing
(insert data)
Real-Time Search Ranking How does ranking differ from traditional
web ranking? There are no social relationships in
traditional web pagesTypical web search engines rank based on
links to a site, and links from a site
Website links are not the same as social networking links
Real-Time Search Ranking Ranking is not necessary in RDBMS
systemsRDBMS systems do not favor certain data
over others based on select criteria
RDBMS systems rank all data contained in the database the same essentially
TI Ranking
(insert data)
What are others doing?
Implications/Conclusion
(insert data)
References
(insert data)