Heiko Ehrig: Resources! Resources! Resources!

Post on 07-Jul-2015

109 views 0 download

Tags:

description

Heiko Ehrig (Neofonie) introduced the company shortly. They shifted from developing search engines web and mobile application development and consulting, including interaction design, testing and data analytics. Neofonie developed a German text mining API that performs classification, keyword detection, entity detection, date detection, NER, and quotes (API key http://bit.ly/txtwerk). From their experience with NLP and linked data they point to the examination of the following issues: -extension of entity types -building more individual customer lexica and sentiment detection. -broaden LD and NLP for more languages than English. -development of a gold standard of German (N)ER. - discussion of standardized text mining API. -support of open data and open licenses.

Transcript of Heiko Ehrig: Resources! Resources! Resources!

Resources! Resources! Resources!

Heiko Ehrig (Head of Research)

2

Berlin, 1998, 1st german search engine, 180 pl, 2 companys

3

What we offer

3

4

✱ 12 Computer Scientists, Linguists, Mathematicians

✱ Text Mining and Analytics, Search ✱  Text Classification ✱  Named Entities and Concept Tagging ✱  Topic Detection and Tracking ✱  Sentiment Analysis (Customer‘s Voice)

✱ Data Analytics & Consulting

✱  Individual Projects

5

Research Department

✱ Works On German Texts ✱  Department Classification ✱  Keyword Detection ✱  Dates Detection ✱  Entity Detection (person, location, organisation) ✱  Concepts with Links to Freebase ✱  Named Entities with Links to Freebase ✱  Quotes

✱ Get Your API Key : http://bit.ly/txtwerk

6

txt werk - a Textmining API

✱ German Resources are rare!

✱ Example Named Entity Linking ✱  We did not find a Gold Standard ✱  Manual Labeling

✱ ERD Challenge 2014 (SIGIR'14 workshop) ✱  Googlers manually reannotated some hundred texts

from ClueWeb (data set not public)

7

Resources! Resources! Resources!

✱ More Entity Types (companies, products, brands)

✱  Individual Customer Lexica

✱ Sentiment Detection

✱ English and more languages

8

Roadmap

✱ Share your resources!

✱ Corporate-friendly licensing!

✱  If you leave the academia, share your resources!

✱ Lobby for resources @EC!

✱ Lobby for maintaining resources servers (like meta-share, datahub.io)

✱ Don‘t forget the Non-English Speaking World!

9

Wishes to the community