Data Cloud Yury Lifshits Yahoo! Research .

40
Data Cloud Yury Lifshits Yahoo! Research http://yury.name

Transcript of Data Cloud Yury Lifshits Yahoo! Research .

Page 1: Data Cloud Yury Lifshits Yahoo! Research .

Data Cloud

Yury Lifshits

Yahoo! Research

http://yury.name

Page 2: Data Cloud Yury Lifshits Yahoo! Research .

My Beliefs

The key challenge in web search is structured search

Part 1: What is structured search?

The key challenge in structured search is collecting data

Part 2: Data distribution & idea of Data Cloud

Part 3: Demo: numeric data distribution

The key challenge in collecting data is incentive design

Part 4: Economics of data distribution

Page 3: Data Cloud Yury Lifshits Yahoo! Research .

StructuredSearch

Page 4: Data Cloud Yury Lifshits Yahoo! Research .
Page 5: Data Cloud Yury Lifshits Yahoo! Research .
Page 6: Data Cloud Yury Lifshits Yahoo! Research .
Page 7: Data Cloud Yury Lifshits Yahoo! Research .
Page 8: Data Cloud Yury Lifshits Yahoo! Research .
Page 9: Data Cloud Yury Lifshits Yahoo! Research .
Page 10: Data Cloud Yury Lifshits Yahoo! Research .
Page 11: Data Cloud Yury Lifshits Yahoo! Research .

Data

Structured data

Entity unit:

• Identifier

• Metadata:

– Explicit key-value pairs

– Relational properties

– Evaluation

Semi-structured data

Content unit:

• Body: text, video, audio, or image

• Metadata:

– Explicit key-value pairs

– Relational properties

– Evaluation

Data = data of entities + data of content

Page 12: Data Cloud Yury Lifshits Yahoo! Research .

Structured Search

Factoid search“what's the value of property X of object Y“

Entity hubs– Domain hubs

Structured object search"all concerts this weekend in SF under 20$ sorted by popularity"– Time focus– Ranking focus – Relations focus

Structured content search "all videos with Tom Brady"“all comments and blog posts about Bing"

Page 13: Data Cloud Yury Lifshits Yahoo! Research .

Yury’s Wishlist

Business-generated data• Products, services, news, wishlists, contact data

Reality stream, sensors• Where what have happened

Expert knowledge• Glossary, issues, typical solutions, object databases, related

objects graph

Events• Sport, concerts, education, corporate, community, private

Market graph & signals• Like, interested, use, following, want to buy; votes and ratings

Page 14: Data Cloud Yury Lifshits Yahoo! Research .

Search as a Platform

App 4 Classic search App 1 App 2 App 3

Structured DataStructured DataWeb index

Post analysis Query analysis

Page 15: Data Cloud Yury Lifshits Yahoo! Research .

Data CloudHow to collect all structured data in one place?

Page 16: Data Cloud Yury Lifshits Yahoo! Research .

Data Producers

• People: forums, wiki, mail groups, blogs, social networks

• Enterprizes: product profiles, corporate news, professional content

• Sensors: GPS modules, web cameras, traffic sensors, RFID

• Transactional data

Page 17: Data Cloud Yury Lifshits Yahoo! Research .

Data Distributors

Data distributor is any technical solution to accumulate, organize and provide access to structured and semi-structured data

Data publisher: the original distributor of some data

Data retailer: a consumer-facing distributor of some data

Page 18: Data Cloud Yury Lifshits Yahoo! Research .

Data Consumers

• Humans– Email

– Aggregators: news, friend feeds, RSS readers

– Search

– Browsing / random walks

• Intelligence projects– Recommendation systems

– Trend mining

Page 19: Data Cloud Yury Lifshits Yahoo! Research .

Data Cloud

Data Cloud is a centralized fully-functional data distribution service

Success metric for data cloud strategy = the total “value” of data on the cloud

Page 20: Data Cloud Yury Lifshits Yahoo! Research .

To-Cloud Solutions

• Extraction– DBpedia.org, “web tables”

• Semantic markup, data APIs– Yahoo! SearchMonkey

• Feeds– Yahoo! Shopping

– Disqus.com, js-kit.com, Facebook Connect

• Direct publishing

Page 21: Data Cloud Yury Lifshits Yahoo! Research .

On-Cloud Solutions

• Ontology maintenance– Freebase

• Normalization, de-duplication, antispam

• Named entity recognition, metadata inference, ranking

• Data recycling (cross-references)– Amazon Public Data Sets

– Viral license

• Hosted search – Yahoo! BOSS

Page 22: Data Cloud Yury Lifshits Yahoo! Research .

From-Cloud Solutions

• Search, audience– Y! SearchMonkey, Google Base

• Data API, dump access, update stream

• Custom notifications– Gnip.com

• Data cloud as a primary backend

• Access control– Ad distribution. (AT&T and Yahoo! Local deal)

Page 23: Data Cloud Yury Lifshits Yahoo! Research .

Demo:webNumbr.com

Joint work with Paul Tarjan

Page 24: Data Cloud Yury Lifshits Yahoo! Research .
Page 25: Data Cloud Yury Lifshits Yahoo! Research .

webNumbr.com: Import

• Crawl numbers from the webURL + XPath + regex

• Create “numbr pages”• Update their values every hour • Keep the history

Anyone can create a numbrhttp://webnumbr.com/create

Page 26: Data Cloud Yury Lifshits Yahoo! Research .

webNumbr.com: Export

• Embed code

• Graphs

• Search & browse

• RSS

Page 27: Data Cloud Yury Lifshits Yahoo! Research .

Economics of Data Distribution

Joint work with Ravi Kumar and Andrew Tomkins

Page 28: Data Cloud Yury Lifshits Yahoo! Research .

Network Effect in Two-Sided Markets

Two sided market = every product serves consumers of two types A and B

Cross-side network effect: the more type-A users product X has, the more attractive it is for type-B consumers and vice versa

Examples: operating systems, credit cards, e-commerce marketplaces

Two-sided network effects: A theory of information product designG. Parker, M.W. Van Alstyne, N. Bulkley, M. Van Alstyne

Page 29: Data Cloud Yury Lifshits Yahoo! Research .

Basic model

• Distributors D1, … Dk

• Producer/consumer joins only one distributor

• Initial shares (p1,c1) … (pk,ck)

• New consumer selects a distributor with a probability proportional to pi

• New producer selects a distributor with probability proportional to ci

Page 30: Data Cloud Yury Lifshits Yahoo! Research .

Basic model

a1 a4a2 a3

a1 a4a3a2

Page 31: Data Cloud Yury Lifshits Yahoo! Research .

Market Shares Dynamics

Theorem 1Market shares will stabilize

Theorem 2With super-liner preference rule

one of distributors will tip

Theorem 3With sub-liner preference rule

market shares will flatten

Page 32: Data Cloud Yury Lifshits Yahoo! Research .

External Factor

Preference rule with external factor:

ei+ci/(c1+…+ck)

Theorem 4 Market shares will stabilize on e1 : e2 : … : ek

Page 33: Data Cloud Yury Lifshits Yahoo! Research .

Coalition

Data Cloud

Page 34: Data Cloud Yury Lifshits Yahoo! Research .

Coalitions

Theorem 5

If all market shares are below 1/sqrt(k)

coalition (sharing data) is profitable for

all distributors

Corollary

Coalitions are not monotone

Example: 5 : 4 : 1 : 1

Page 35: Data Cloud Yury Lifshits Yahoo! Research .

Model Variations

• Same-side network effect

• Different p-to-c and c-to-p rules

• Multi-homing (overlapping audiences)

• n^2 vs. nlog n revenue models

• Mature market: newcomer rate = departing rate

• Diverse market (many types of producers and consumers)

• Newcoming and departing distributors

• Directed coalitions

Page 36: Data Cloud Yury Lifshits Yahoo! Research .

Challenges

Page 37: Data Cloud Yury Lifshits Yahoo! Research .

Marketing

• Data demand?

• Data offerings?

• Requirements for distribution technology?

Page 38: Data Cloud Yury Lifshits Yahoo! Research .

Incentive design

• Incentives for data sharing?

• Centralized or distributed?– For profit or non-profit?

• Data licensing and ownership?

• Monetizing data cloud?

Page 39: Data Cloud Yury Lifshits Yahoo! Research .

More Challenges

Prototyping:• Data marketplace: open data & data demand• Search plugins: related objects, glossaries, object timelines• Publishing tools for structured data• Data client: structured news, bookmarking, notifications

Tech design:• Access management• Namespace design

User interface:• Structured search UI• Discovery UI

Page 40: Data Cloud Yury Lifshits Yahoo! Research .

Thanks!

Follow my research:http://twitter.com/yurylifshitshttp://yury.name/blog