Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

25
TI: AN EFFICIENT INDEXING MECHANISM FOR REAL-TIME SEARCH ON TWEETS SIGMOD ‘11 C. CHEN ET AL Pete Bohman Adam Kunk

Transcript of Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

Page 1: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

TI: AN EFFICIENT INDEXING MECHANISM FOR REAL-TIME

SEARCH ON TWEETSSIGMOD ‘11

C. CHEN ET AL

Pete Bohman

Adam Kunk

Page 2: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

What is real-time search?

What do you think as a class?

Page 3: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

Real-Time Search

Definition: A search mechanism capable of finding information in an online fashion as it is produced.

Page 4: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

Real-Time Search

In terms of real-time search, what does “online” mean?Online means that a constant flow of input

data is handled, contrary to batch processing

Bing Social Search

Page 5: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

What kind of content is fed into real-time search?

1. Microblogging data (new type of data)1. Short temporal life span

2. Little to no context

3. Simple ideas, fast reporting of events

4. Metadata: time, location, relationships

5. Less factual, more opinionated

6. Static posts (vs. dynamic web pages)

7. Furious input rate

8. Often no hyperlink structure, few traditional ranking factors

Page 6: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

Real-Time Search Input Data Example of what kind of input data is

considered for these real-time search systems:

twittervision

Page 7: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

Web search engines index webpages perdiodically.Expand on this from content on Wiki?

Web search is not concerned with social links between data/users

What makes real-time search systems different?

Page 8: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

What makes real-time search systems different?

Web search (Google) – primarily concerned with relevance

Real-time search – concerned with relevance, popularity, temporal immediacy

Page 9: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

What makes real-time search systems different?

Web search engines crawl the web periodically and update indexesNeed to since data/pages are dynamic

Real-time search engines have a constant stream of input data.No need to poll since the posts are static

What can we do with real-time search engines?

Page 10: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

What are the applications of real-time search?

How are people using it?

Different types of queries

Extended Queries

Page 11: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

What are the applications of real-time search?

Real-time news reportsExample System: TwitterStandTwitter is one of the first avenues in which

news is reportedExample: Michael Jackson’s death ordering of

media events○ 9-11 call, twitter update, LA times, Mainstream

newsCrowdsourcing of first hand reports, CNN

does real-time search to see what people are reporting

Page 12: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

What are the applications of real-time search?

Real-time alert systemsLeverages tweet metadata (time, location) to

raise alertsEarthquake localization based on tweets

Page 13: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

Twitter Real-Time Alerts

USGS Twitter Earthquake Detector

Page 14: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

Value of real-time search

The estimated value of real-time search is around $33,000,000This is based on queries posted to the

collecta systemValue derived from types of queries entered

in real-time search systems, then valuing these with adwords auctions

Page 15: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

Related Work (belong here?) Partial Indexes

View Materialization

Google and Twitter microblog real-time search engines

Page 16: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

Difficulties of Real-Time Search

The problem can be split into two factors:Efficient indexing in order to provide for fast

results

Effective ranking in order to return relevant results

Page 17: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

Real-Time Search Indexing

How does indexing differ from traditional web indexing?The large volumes of input in microblogging

applications such as Twitter○ Not feasible to index every incoming tweet

immediately

Need to use selective indexing based on results that are most likely to appear

Page 18: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

Real-Time Search Indexing RDBMS indexing versus real-time

Real-time indexing is not looking at structured data

Indexes in RDBMS are built on columns, and they are not necessary to return results○ They just speed up the search

Indexes in real-time applications determine which results are candidate to be returned

Page 19: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

TI indexing

(insert data)

Page 20: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

Real-Time Search Ranking How does ranking differ from traditional

web ranking? There are no social relationships in

traditional web pagesTypical web search engines rank based on

links to a site, and links from a site

Website links are not the same as social networking links

Page 21: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

Real-Time Search Ranking Ranking is not necessary in RDBMS

systemsRDBMS systems do not favor certain data

over others based on select criteria

RDBMS systems rank all data contained in the database the same essentially

Page 22: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

TI Ranking

(insert data)

Page 23: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

What are others doing?

Page 24: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

Implications/Conclusion

(insert data)

Page 25: Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?

References

(insert data)