Common Crawl : enabling machine-scale analysis of web data Lisa Green Kurt Bollacker Jordan...
-
Upload
kailyn-wigginton -
Category
Documents
-
view
222 -
download
0
Transcript of Common Crawl : enabling machine-scale analysis of web data Lisa Green Kurt Bollacker Jordan...
Common Crawl:enabling machine-scale
analysis of web data
Lisa GreenKurt Bollacker
Jordan Mendelson
IIPC2014-05-19
Photo license: Public Domain Origin: http://en.wikipedia.org/wiki/File:Floppy_disk_2009_G1.jpg
Photo license: CC BY-SA http://commons.wikimedia.org/wiki/File:Img20050526_0007_at_tannheim_cumulus.jpg
Photo license: CC-BY-NC https://www.flickr.com/photos/malloreigh/5580160943
Photo license: CC-BY-SA Origin: http://en.wikipedia.org/wiki/File:Wikimedia_Foundation_Servers-8055_08.jpg
Enable machine scale access and analysis of web data for everyone
Web Data Commons:“Extracting Structured Data from the Common Crawl”
WikiEntities (Han Xiaogang) In What Context Is a Term Referenced?
WikiEntities Example: DiscographyWho are the most popular artists?
How Easily Can Google Analytics Track Our Browsing? (S. Merity, C.
Hornbaker)
Data Publica: Finding French Open Data
Commercial Applications:Improved Spell Checking
may be too domain specific
Photo license: CC-BY-NC-ND https://www.flickr.com/photos/blueforce4116/1398245798
Photo license: CC BY-SA http://commons.wikimedia.org/wiki/File:Img20050526_0007_at_tannheim_cumulus.jpg
Photo license: CC-BY Origin: http://en.wikipedia.org/wiki/File:Internet_map_1024.
Image license: CC BY-SA https://www.flickr.com/photos/xdxd_vs_xdxd/6829447421
Photo license: CC-BY-SA https://www.flickr.com/photos/hackny/6202775045