Multilingual scraping from dutch government data

7
Multilingual Scraping from Open Dutch Government Data Open Data Day Hackathon Ireland DERI & 091 labs Galway, 4 Dec 2010 Tobias Wunner

description

 

Transcript of Multilingual scraping from dutch government data

Page 1: Multilingual scraping from dutch government data

Multilingual Scraping fromOpen Dutch Government Data

Open Data Day Hackathon IrelandDERI & 091 labs Galway, 4 Dec 2010

Tobias Wunner

Page 2: Multilingual scraping from dutch government data

Dutch open government data

3 websites same databut multilingual

Page 3: Multilingual scraping from dutch government data

Dutch Spending Data

JavascriptWebsite

Pixel Graphicin PDF

Page 4: Multilingual scraping from dutch government data

Dutch Spending Data

Website

Pixel Graphicin PDF

DIFFICULT!

Page 5: Multilingual scraping from dutch government data

• 367 concept (24 Excel files)

• concept hierarchy

Scrape multilingual concepts

“Long-term interest rate”@en“Lange Rente”@nl

“International items”@en“Internationale conjunctur”@nl

super concept

Page 6: Multilingual scraping from dutch government data

• 367 concept (24 Excel files)

• concept hierarchy

Scrape multilingual concepts

“Long-term interest rate”@en“Lange Rente”@nl

“International items”@en“Internationale conjunctur”@nl

super concept

Page 7: Multilingual scraping from dutch government data

[1] Open Data Day Galway with results http://www.opendataday.org/wiki/City_Events#Galway

[2] Multilingual scraper fo Dutch Government Data http://scraperwiki.com/scrapers/cpbnl-multilingual-terminology/

References