Structured Data in Web Search
-
Upload
exascale-infolab -
Category
Science
-
view
746 -
download
2
description
Transcript of Structured Data in Web Search
![Page 1: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/1.jpg)
Structured Data on the Web
Alon HalevyGoogle
May 23, 2014
Joint work with: Jayant Madhavan, Cong Yu, Fei Wu, Hongrae Lee, Warren ShenAnish Das Sarma, Rahul Gupta, Boulos Harb, Zack Ives, Afshin Rostamizadeh, Sree Balakrishnan, Anno Langen, Steven Whang, Mohamed Yahya, and others
![Page 2: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/2.jpg)
Structured Data in Search Results
![Page 3: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/3.jpg)
![Page 4: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/4.jpg)
![Page 5: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/5.jpg)
Set QueriesChicago restaurants
![Page 6: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/6.jpg)
Association Queries
![Page 7: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/7.jpg)
Data in Movies!
![Page 8: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/8.jpg)
The Knowledge Graph
Knowledge Graph
Brazil
Brasiliacapital
population2014
2001
mayor
![Page 9: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/9.jpg)
Query Reformulation
Knowledge Graph
Brazil
Brasiliacapital
population2014
2001
mayor
Brazil capitalWhat is the capital of
Brazil“Google, tell me the
capital of brazil”
Brazil nuts Culture of Brazil “Google, will Brazil
win the world cup?”
![Page 10: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/10.jpg)
Other Sources of Data
Knowledge Graph
Brazil
Brasiliacapital
population2014
2001
mayor
Brazil capital
The population of Brasilia is 2207718 according to the GeoNames geographical
database
Tables Text
![Page 11: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/11.jpg)
Answer Queries Directly from Web?
Brazil capital
The population of Brasilia is 2207718 according to the GeoNames geographical
database
Tables Text
Knowledge Graph
Brazil
Brasiliacapital
population2014
2001
mayor
![Page 12: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/12.jpg)
The Web vs. the Knowledge Graph
![Page 13: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/13.jpg)
Tables, Tables
Brazil capital
The population of Brasilia is 2207718 according to the GeoNames geographical
database
Tables Text
Knowledge Graph
Brazil
Brasiliacapital
population2014
2001
mayor
Fusion Tables: Enabling a broad range of users to create tabular content
WebTables: Finding good HTML tables on the Web
![Page 14: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/14.jpg)
• City planning
• Sustainability: water, coffee, …
• Crisis response
• Advancing public discourse (e.g., gun control)
• Data philanthropy – corporations encouraged to contribute data to the good of society.
![Page 15: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/15.jpg)
Background for Coffee Examples
![Page 16: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/16.jpg)
Fusion Tablesgoogle.com/fusiontables
[SIGMOD 2010, SIGMOD 2012]
• Goal: an easy-to-use database system that is integrated with the Web.
• Key: support common workflows– Easy upload (CSV, KML, spreadsheets)– Sharing (even outside your company)– Visualizations front and center– Easy publishing
• Goal 2: Fusion in the data cloud -- discover others’ data and combine with yours.
![Page 17: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/17.jpg)
Coffee Producing Countries
![Page 18: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/18.jpg)
Coffee Consumption Per Capita
![Page 19: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/19.jpg)
![Page 20: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/20.jpg)
Big Data for Regular People
Table Facts:
English poverty rates:32,000 wards with a total of 1.8 million verticesColors indicate poverty levels
2011 Rioting:2100 incidentsColors indicate addresses of Rioting and Rioters
Best UK Internet Journalist
Knight-Batten Award for Innovations in Journalism
![Page 21: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/21.jpg)
![Page 22: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/22.jpg)
![Page 23: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/23.jpg)
Crowd Sourcing
![Page 24: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/24.jpg)
![Page 25: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/25.jpg)
![Page 26: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/26.jpg)
Data Integration as Search
![Page 27: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/27.jpg)
![Page 28: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/28.jpg)
Join with Population Data:What is a City?
![Page 29: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/29.jpg)
Big Data Integration
Table Facts:
Texas Counties 2010 Census:254 counties with 543000 verticesColored based on various demographics
See SIGMOD 2012 paper for details on scaling map visualizations
![Page 30: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/30.jpg)
Crowdsourcing Cafes
![Page 31: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/31.jpg)
HTML Tables
![Page 32: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/32.jpg)
Search Engine for Data Sets
research.google.com/tables[VLDB 2008, 2011, 2014]
![Page 33: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/33.jpg)
Give Answers from Tables
![Page 34: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/34.jpg)
It Better Be Right!
![Page 35: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/35.jpg)
Answer with a Visualization
![Page 36: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/36.jpg)
Long Term Goal: A Data-Guided Decision Engine
• Support decision making:– Healthcare debate– Should I install solar in my house?– Which charity should I contribute to?
• Show relevant data– Expose facets of the decision and enable drilldown– Show opposing views
• Manually curated examples of decision engines:– Justfacts.com, followthemoney.com, decide.com
![Page 37: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/37.jpg)
WebTables on google.com!
![Page 38: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/38.jpg)
HTML Lists
See Elmeleegy et al., VLDB 2009
![Page 39: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/39.jpg)
Tree Search
Amish quilts
Parking tickets in India
Horses
The Deep Web [Madhavan et al., VLDB 2008]
![Page 40: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/40.jpg)
Other Sources of Data
• Spreadsheets• CSV files• Tables embedded in PDF• XML, RDF• Visualizations• Online databases (Fusion Tables, Tableau, …)
Each source has its particularities, but most problems are common to all.
![Page 41: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/41.jpg)
Non-Tabular Data in HTML
![Page 42: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/42.jpg)
Vertical Tables
![Page 43: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/43.jpg)
![Page 44: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/44.jpg)
Data Optimized for Page Layout
![Page 45: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/45.jpg)
Tabular Data Optimized for Site Layout
See [Ling et al, IJCAI 2013] for stitching tables within a site.
![Page 46: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/46.jpg)
Semantics Can Be Brittle
![Page 47: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/47.jpg)
Semantics are in Text
![Page 48: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/48.jpg)
The Big Challenge
• Analyze natural language text as it pertains to structured data.
• Different from (open) information extraction that builds databases entirely from text.
• Good news: natural language parsing technology is now scalable.
![Page 49: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/49.jpg)
First Step: Annotating Columns [Venetis et al., VLDB 2011]
![Page 50: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/50.jpg)
Step 2: Understanding Relationships
![Page 51: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/51.jpg)
![Page 52: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/52.jpg)
Dictionary of Attributes
• I want the list of all attributes that countries may have.
• Freebase doesn’t have coffee production. • Is this an ontology?
– Not quite! I want an ontology suited for search.
![Page 53: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/53.jpg)
Biperpedia: [VLDB 2014]
Ontology for Search Applications
![Page 54: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/54.jpg)
![Page 55: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/55.jpg)
Comparing to Freebase Coverage
![Page 56: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/56.jpg)
Tower of Babel: Internet Style
In 2013, the coffee production of El Salvador dropped by 20% due to the coffee rust disease.
Coffee production el salvador 2013
El Salvador exports coffee 2013
Knowledge Graph
Tables Text
![Page 57: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/57.jpg)
Conclusions
• This was a talk about Big Data:– Millions of people creating data sets– Billions of people seeing the data being impacted
• Get out there and find your favorite application.
• Dreams do come true:– At least as it pertains to structured data on the
Web!
![Page 58: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/58.jpg)
![Page 59: Structured Data in Web Search](https://reader035.fdocuments.us/reader035/viewer/2022062300/554e8ca4b4c90573338b4b37/html5/thumbnails/59.jpg)
References
• Fusion Tables: SIGMOD 2010, 2012• WebTables: VLDB 2008, 2009, 2011