Computer-Based Text- and Data Analysis€¦ · IBM Watson (wins Jeopardy in 2011) Machine...
Transcript of Computer-Based Text- and Data Analysis€¦ · IBM Watson (wins Jeopardy in 2011) Machine...
Computer-Based Text- and Data AnalysisTechnologies and Applications
Mark Cieliebak962015
2Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
use
3Mark Cieliebak May 2015ZHAW
About Me
Mark Cieliebak
+ Software Engineer amp Data Scientist
+ PhD in Computer Science
+ CIO at Netbreeze (now Microsoft)
+ gt30 scientific publications
Lecturer at ZHAW
CEO of SpinningBytes AG
4Mark Cieliebak May 2015ZHAW
Classical Data Analysis
Peptide Sequencing
Uses IT to
retrieve
store
search
count
calculate
compare
visualize
data
5Mark Cieliebak May 2015ZHAW
Computer-Based Data Analytics
bull Started with Artificial intelligence in the 1960s
bull Analyzes huge amounts of data
bull Uses Machine Learning for
bull Pattern Detection
bull Image and Text Classification
bull Predictive Analysis
bull Data Clustering
bull and many more
6Mark Cieliebak May 2015ZHAW
Applications of Data Analytics
Deep Blue
(beats Kasparow 1997)
IBM Watson
(wins Jeopardy in 2011)
Machine Translation
Selfdriving Cars
Spelling Correction
Internet Search
Recommendation
Systems
Email Spam
Detection
Voice Recognition
DATAANALYTICS
7Mark Cieliebak May 2015ZHAW
Foundation of Data Analytics
Memory 64000 Bytes
8589934592 Bytes
Performance 500000 FLOPS
FLOPS
Cost per GFLOP (1984) $4278000000
$008
33863000000000000
1960 2015
8Mark Cieliebak May 2015ZHAW
Foundation of Data Analytics
9Mark Cieliebak May 2015ZHAW
Foundations of Data Analytics
Faster Computers + More Data + Better Algorithms
Deep Learning
10Mark Cieliebak May 2015ZHAW
Predicted
Label
Machine Learning in a NutshellT
rain
ing
Ap
pli
ca
tio
n
11Mark Cieliebak May 2015ZHAW
Application Social Media Analysis
Sentiment Analysis
Topic Extraction
Trend Detection
Alerting
12Mark Cieliebak May 2015ZHAW
Twitter Facts
Data Access
bull Free Access with Search API
bull Commercial Access to Firehose Stream (all or 10)
13Mark Cieliebak May 2015ZHAW
Sentiment Analysis on Twitter
StackOverflow names Apple Swift
the worlds most loved programming
language httpwwwbloombergcomhellip
14Mark Cieliebak May 2015ZHAW
Sentiment Analysis Performance of
Commercial Tools (F1-Score)
0
01
02
03
04
05
06
07
Average of All Tools
Best Tool per Corpus
Overall Best Tool (Sentigem)
Experimantal Setup
9 commercial sentiment analysis tools evaluated on 7 public corpora with 28653 short texts
15Mark Cieliebak May 2015ZHAW
Take-Home Lesson
Sentiment Tools on short texts
achieve on average an
F1-Score of 51
16Mark Cieliebak May 2015ZHAW
Application Newspaper Segmentation
17Mark Cieliebak May 2015ZHAW
Application Sales Prediction
18Mark Cieliebak May 2015ZHAW
More Applications
Expert Match
Foundation Register Speaker Detection
FaceImage Recognition
19Mark Cieliebak May 2015ZHAW
What do Data Scientists Need
20Mark Cieliebak May 2015ZHAW
Data Sources
bull Experimental Data
bull Reference Datasets
bull Dictionaires Ontologies Thesaurus
bull Scientific Papers
bull Social Media
bull Media Archives Newspapers Magazines
bull Websites
bull Live Streams Twitter Online News Product Reviews
bull VideosMoviesPictures
bull Books Belletristic Technical Literature
bull Wikipedia
bull Hand-crafted Datasets
bull etc etc
21Mark Cieliebak May 2015ZHAW
Data Provisioning
Data Collections
bull Linguistic Data Consortium
bull European Language Ressource Association
bull NISt
bull TIMIT
bull etc
Access Licensing
Open Data
bull Open Governmental Data OGD
bull Open Research Data ORD
bull Linked Open Data
Participation Guidelines
22Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
support
use
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
2Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
use
3Mark Cieliebak May 2015ZHAW
About Me
Mark Cieliebak
+ Software Engineer amp Data Scientist
+ PhD in Computer Science
+ CIO at Netbreeze (now Microsoft)
+ gt30 scientific publications
Lecturer at ZHAW
CEO of SpinningBytes AG
4Mark Cieliebak May 2015ZHAW
Classical Data Analysis
Peptide Sequencing
Uses IT to
retrieve
store
search
count
calculate
compare
visualize
data
5Mark Cieliebak May 2015ZHAW
Computer-Based Data Analytics
bull Started with Artificial intelligence in the 1960s
bull Analyzes huge amounts of data
bull Uses Machine Learning for
bull Pattern Detection
bull Image and Text Classification
bull Predictive Analysis
bull Data Clustering
bull and many more
6Mark Cieliebak May 2015ZHAW
Applications of Data Analytics
Deep Blue
(beats Kasparow 1997)
IBM Watson
(wins Jeopardy in 2011)
Machine Translation
Selfdriving Cars
Spelling Correction
Internet Search
Recommendation
Systems
Email Spam
Detection
Voice Recognition
DATAANALYTICS
7Mark Cieliebak May 2015ZHAW
Foundation of Data Analytics
Memory 64000 Bytes
8589934592 Bytes
Performance 500000 FLOPS
FLOPS
Cost per GFLOP (1984) $4278000000
$008
33863000000000000
1960 2015
8Mark Cieliebak May 2015ZHAW
Foundation of Data Analytics
9Mark Cieliebak May 2015ZHAW
Foundations of Data Analytics
Faster Computers + More Data + Better Algorithms
Deep Learning
10Mark Cieliebak May 2015ZHAW
Predicted
Label
Machine Learning in a NutshellT
rain
ing
Ap
pli
ca
tio
n
11Mark Cieliebak May 2015ZHAW
Application Social Media Analysis
Sentiment Analysis
Topic Extraction
Trend Detection
Alerting
12Mark Cieliebak May 2015ZHAW
Twitter Facts
Data Access
bull Free Access with Search API
bull Commercial Access to Firehose Stream (all or 10)
13Mark Cieliebak May 2015ZHAW
Sentiment Analysis on Twitter
StackOverflow names Apple Swift
the worlds most loved programming
language httpwwwbloombergcomhellip
14Mark Cieliebak May 2015ZHAW
Sentiment Analysis Performance of
Commercial Tools (F1-Score)
0
01
02
03
04
05
06
07
Average of All Tools
Best Tool per Corpus
Overall Best Tool (Sentigem)
Experimantal Setup
9 commercial sentiment analysis tools evaluated on 7 public corpora with 28653 short texts
15Mark Cieliebak May 2015ZHAW
Take-Home Lesson
Sentiment Tools on short texts
achieve on average an
F1-Score of 51
16Mark Cieliebak May 2015ZHAW
Application Newspaper Segmentation
17Mark Cieliebak May 2015ZHAW
Application Sales Prediction
18Mark Cieliebak May 2015ZHAW
More Applications
Expert Match
Foundation Register Speaker Detection
FaceImage Recognition
19Mark Cieliebak May 2015ZHAW
What do Data Scientists Need
20Mark Cieliebak May 2015ZHAW
Data Sources
bull Experimental Data
bull Reference Datasets
bull Dictionaires Ontologies Thesaurus
bull Scientific Papers
bull Social Media
bull Media Archives Newspapers Magazines
bull Websites
bull Live Streams Twitter Online News Product Reviews
bull VideosMoviesPictures
bull Books Belletristic Technical Literature
bull Wikipedia
bull Hand-crafted Datasets
bull etc etc
21Mark Cieliebak May 2015ZHAW
Data Provisioning
Data Collections
bull Linguistic Data Consortium
bull European Language Ressource Association
bull NISt
bull TIMIT
bull etc
Access Licensing
Open Data
bull Open Governmental Data OGD
bull Open Research Data ORD
bull Linked Open Data
Participation Guidelines
22Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
support
use
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
3Mark Cieliebak May 2015ZHAW
About Me
Mark Cieliebak
+ Software Engineer amp Data Scientist
+ PhD in Computer Science
+ CIO at Netbreeze (now Microsoft)
+ gt30 scientific publications
Lecturer at ZHAW
CEO of SpinningBytes AG
4Mark Cieliebak May 2015ZHAW
Classical Data Analysis
Peptide Sequencing
Uses IT to
retrieve
store
search
count
calculate
compare
visualize
data
5Mark Cieliebak May 2015ZHAW
Computer-Based Data Analytics
bull Started with Artificial intelligence in the 1960s
bull Analyzes huge amounts of data
bull Uses Machine Learning for
bull Pattern Detection
bull Image and Text Classification
bull Predictive Analysis
bull Data Clustering
bull and many more
6Mark Cieliebak May 2015ZHAW
Applications of Data Analytics
Deep Blue
(beats Kasparow 1997)
IBM Watson
(wins Jeopardy in 2011)
Machine Translation
Selfdriving Cars
Spelling Correction
Internet Search
Recommendation
Systems
Email Spam
Detection
Voice Recognition
DATAANALYTICS
7Mark Cieliebak May 2015ZHAW
Foundation of Data Analytics
Memory 64000 Bytes
8589934592 Bytes
Performance 500000 FLOPS
FLOPS
Cost per GFLOP (1984) $4278000000
$008
33863000000000000
1960 2015
8Mark Cieliebak May 2015ZHAW
Foundation of Data Analytics
9Mark Cieliebak May 2015ZHAW
Foundations of Data Analytics
Faster Computers + More Data + Better Algorithms
Deep Learning
10Mark Cieliebak May 2015ZHAW
Predicted
Label
Machine Learning in a NutshellT
rain
ing
Ap
pli
ca
tio
n
11Mark Cieliebak May 2015ZHAW
Application Social Media Analysis
Sentiment Analysis
Topic Extraction
Trend Detection
Alerting
12Mark Cieliebak May 2015ZHAW
Twitter Facts
Data Access
bull Free Access with Search API
bull Commercial Access to Firehose Stream (all or 10)
13Mark Cieliebak May 2015ZHAW
Sentiment Analysis on Twitter
StackOverflow names Apple Swift
the worlds most loved programming
language httpwwwbloombergcomhellip
14Mark Cieliebak May 2015ZHAW
Sentiment Analysis Performance of
Commercial Tools (F1-Score)
0
01
02
03
04
05
06
07
Average of All Tools
Best Tool per Corpus
Overall Best Tool (Sentigem)
Experimantal Setup
9 commercial sentiment analysis tools evaluated on 7 public corpora with 28653 short texts
15Mark Cieliebak May 2015ZHAW
Take-Home Lesson
Sentiment Tools on short texts
achieve on average an
F1-Score of 51
16Mark Cieliebak May 2015ZHAW
Application Newspaper Segmentation
17Mark Cieliebak May 2015ZHAW
Application Sales Prediction
18Mark Cieliebak May 2015ZHAW
More Applications
Expert Match
Foundation Register Speaker Detection
FaceImage Recognition
19Mark Cieliebak May 2015ZHAW
What do Data Scientists Need
20Mark Cieliebak May 2015ZHAW
Data Sources
bull Experimental Data
bull Reference Datasets
bull Dictionaires Ontologies Thesaurus
bull Scientific Papers
bull Social Media
bull Media Archives Newspapers Magazines
bull Websites
bull Live Streams Twitter Online News Product Reviews
bull VideosMoviesPictures
bull Books Belletristic Technical Literature
bull Wikipedia
bull Hand-crafted Datasets
bull etc etc
21Mark Cieliebak May 2015ZHAW
Data Provisioning
Data Collections
bull Linguistic Data Consortium
bull European Language Ressource Association
bull NISt
bull TIMIT
bull etc
Access Licensing
Open Data
bull Open Governmental Data OGD
bull Open Research Data ORD
bull Linked Open Data
Participation Guidelines
22Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
support
use
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
4Mark Cieliebak May 2015ZHAW
Classical Data Analysis
Peptide Sequencing
Uses IT to
retrieve
store
search
count
calculate
compare
visualize
data
5Mark Cieliebak May 2015ZHAW
Computer-Based Data Analytics
bull Started with Artificial intelligence in the 1960s
bull Analyzes huge amounts of data
bull Uses Machine Learning for
bull Pattern Detection
bull Image and Text Classification
bull Predictive Analysis
bull Data Clustering
bull and many more
6Mark Cieliebak May 2015ZHAW
Applications of Data Analytics
Deep Blue
(beats Kasparow 1997)
IBM Watson
(wins Jeopardy in 2011)
Machine Translation
Selfdriving Cars
Spelling Correction
Internet Search
Recommendation
Systems
Email Spam
Detection
Voice Recognition
DATAANALYTICS
7Mark Cieliebak May 2015ZHAW
Foundation of Data Analytics
Memory 64000 Bytes
8589934592 Bytes
Performance 500000 FLOPS
FLOPS
Cost per GFLOP (1984) $4278000000
$008
33863000000000000
1960 2015
8Mark Cieliebak May 2015ZHAW
Foundation of Data Analytics
9Mark Cieliebak May 2015ZHAW
Foundations of Data Analytics
Faster Computers + More Data + Better Algorithms
Deep Learning
10Mark Cieliebak May 2015ZHAW
Predicted
Label
Machine Learning in a NutshellT
rain
ing
Ap
pli
ca
tio
n
11Mark Cieliebak May 2015ZHAW
Application Social Media Analysis
Sentiment Analysis
Topic Extraction
Trend Detection
Alerting
12Mark Cieliebak May 2015ZHAW
Twitter Facts
Data Access
bull Free Access with Search API
bull Commercial Access to Firehose Stream (all or 10)
13Mark Cieliebak May 2015ZHAW
Sentiment Analysis on Twitter
StackOverflow names Apple Swift
the worlds most loved programming
language httpwwwbloombergcomhellip
14Mark Cieliebak May 2015ZHAW
Sentiment Analysis Performance of
Commercial Tools (F1-Score)
0
01
02
03
04
05
06
07
Average of All Tools
Best Tool per Corpus
Overall Best Tool (Sentigem)
Experimantal Setup
9 commercial sentiment analysis tools evaluated on 7 public corpora with 28653 short texts
15Mark Cieliebak May 2015ZHAW
Take-Home Lesson
Sentiment Tools on short texts
achieve on average an
F1-Score of 51
16Mark Cieliebak May 2015ZHAW
Application Newspaper Segmentation
17Mark Cieliebak May 2015ZHAW
Application Sales Prediction
18Mark Cieliebak May 2015ZHAW
More Applications
Expert Match
Foundation Register Speaker Detection
FaceImage Recognition
19Mark Cieliebak May 2015ZHAW
What do Data Scientists Need
20Mark Cieliebak May 2015ZHAW
Data Sources
bull Experimental Data
bull Reference Datasets
bull Dictionaires Ontologies Thesaurus
bull Scientific Papers
bull Social Media
bull Media Archives Newspapers Magazines
bull Websites
bull Live Streams Twitter Online News Product Reviews
bull VideosMoviesPictures
bull Books Belletristic Technical Literature
bull Wikipedia
bull Hand-crafted Datasets
bull etc etc
21Mark Cieliebak May 2015ZHAW
Data Provisioning
Data Collections
bull Linguistic Data Consortium
bull European Language Ressource Association
bull NISt
bull TIMIT
bull etc
Access Licensing
Open Data
bull Open Governmental Data OGD
bull Open Research Data ORD
bull Linked Open Data
Participation Guidelines
22Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
support
use
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
5Mark Cieliebak May 2015ZHAW
Computer-Based Data Analytics
bull Started with Artificial intelligence in the 1960s
bull Analyzes huge amounts of data
bull Uses Machine Learning for
bull Pattern Detection
bull Image and Text Classification
bull Predictive Analysis
bull Data Clustering
bull and many more
6Mark Cieliebak May 2015ZHAW
Applications of Data Analytics
Deep Blue
(beats Kasparow 1997)
IBM Watson
(wins Jeopardy in 2011)
Machine Translation
Selfdriving Cars
Spelling Correction
Internet Search
Recommendation
Systems
Email Spam
Detection
Voice Recognition
DATAANALYTICS
7Mark Cieliebak May 2015ZHAW
Foundation of Data Analytics
Memory 64000 Bytes
8589934592 Bytes
Performance 500000 FLOPS
FLOPS
Cost per GFLOP (1984) $4278000000
$008
33863000000000000
1960 2015
8Mark Cieliebak May 2015ZHAW
Foundation of Data Analytics
9Mark Cieliebak May 2015ZHAW
Foundations of Data Analytics
Faster Computers + More Data + Better Algorithms
Deep Learning
10Mark Cieliebak May 2015ZHAW
Predicted
Label
Machine Learning in a NutshellT
rain
ing
Ap
pli
ca
tio
n
11Mark Cieliebak May 2015ZHAW
Application Social Media Analysis
Sentiment Analysis
Topic Extraction
Trend Detection
Alerting
12Mark Cieliebak May 2015ZHAW
Twitter Facts
Data Access
bull Free Access with Search API
bull Commercial Access to Firehose Stream (all or 10)
13Mark Cieliebak May 2015ZHAW
Sentiment Analysis on Twitter
StackOverflow names Apple Swift
the worlds most loved programming
language httpwwwbloombergcomhellip
14Mark Cieliebak May 2015ZHAW
Sentiment Analysis Performance of
Commercial Tools (F1-Score)
0
01
02
03
04
05
06
07
Average of All Tools
Best Tool per Corpus
Overall Best Tool (Sentigem)
Experimantal Setup
9 commercial sentiment analysis tools evaluated on 7 public corpora with 28653 short texts
15Mark Cieliebak May 2015ZHAW
Take-Home Lesson
Sentiment Tools on short texts
achieve on average an
F1-Score of 51
16Mark Cieliebak May 2015ZHAW
Application Newspaper Segmentation
17Mark Cieliebak May 2015ZHAW
Application Sales Prediction
18Mark Cieliebak May 2015ZHAW
More Applications
Expert Match
Foundation Register Speaker Detection
FaceImage Recognition
19Mark Cieliebak May 2015ZHAW
What do Data Scientists Need
20Mark Cieliebak May 2015ZHAW
Data Sources
bull Experimental Data
bull Reference Datasets
bull Dictionaires Ontologies Thesaurus
bull Scientific Papers
bull Social Media
bull Media Archives Newspapers Magazines
bull Websites
bull Live Streams Twitter Online News Product Reviews
bull VideosMoviesPictures
bull Books Belletristic Technical Literature
bull Wikipedia
bull Hand-crafted Datasets
bull etc etc
21Mark Cieliebak May 2015ZHAW
Data Provisioning
Data Collections
bull Linguistic Data Consortium
bull European Language Ressource Association
bull NISt
bull TIMIT
bull etc
Access Licensing
Open Data
bull Open Governmental Data OGD
bull Open Research Data ORD
bull Linked Open Data
Participation Guidelines
22Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
support
use
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
6Mark Cieliebak May 2015ZHAW
Applications of Data Analytics
Deep Blue
(beats Kasparow 1997)
IBM Watson
(wins Jeopardy in 2011)
Machine Translation
Selfdriving Cars
Spelling Correction
Internet Search
Recommendation
Systems
Email Spam
Detection
Voice Recognition
DATAANALYTICS
7Mark Cieliebak May 2015ZHAW
Foundation of Data Analytics
Memory 64000 Bytes
8589934592 Bytes
Performance 500000 FLOPS
FLOPS
Cost per GFLOP (1984) $4278000000
$008
33863000000000000
1960 2015
8Mark Cieliebak May 2015ZHAW
Foundation of Data Analytics
9Mark Cieliebak May 2015ZHAW
Foundations of Data Analytics
Faster Computers + More Data + Better Algorithms
Deep Learning
10Mark Cieliebak May 2015ZHAW
Predicted
Label
Machine Learning in a NutshellT
rain
ing
Ap
pli
ca
tio
n
11Mark Cieliebak May 2015ZHAW
Application Social Media Analysis
Sentiment Analysis
Topic Extraction
Trend Detection
Alerting
12Mark Cieliebak May 2015ZHAW
Twitter Facts
Data Access
bull Free Access with Search API
bull Commercial Access to Firehose Stream (all or 10)
13Mark Cieliebak May 2015ZHAW
Sentiment Analysis on Twitter
StackOverflow names Apple Swift
the worlds most loved programming
language httpwwwbloombergcomhellip
14Mark Cieliebak May 2015ZHAW
Sentiment Analysis Performance of
Commercial Tools (F1-Score)
0
01
02
03
04
05
06
07
Average of All Tools
Best Tool per Corpus
Overall Best Tool (Sentigem)
Experimantal Setup
9 commercial sentiment analysis tools evaluated on 7 public corpora with 28653 short texts
15Mark Cieliebak May 2015ZHAW
Take-Home Lesson
Sentiment Tools on short texts
achieve on average an
F1-Score of 51
16Mark Cieliebak May 2015ZHAW
Application Newspaper Segmentation
17Mark Cieliebak May 2015ZHAW
Application Sales Prediction
18Mark Cieliebak May 2015ZHAW
More Applications
Expert Match
Foundation Register Speaker Detection
FaceImage Recognition
19Mark Cieliebak May 2015ZHAW
What do Data Scientists Need
20Mark Cieliebak May 2015ZHAW
Data Sources
bull Experimental Data
bull Reference Datasets
bull Dictionaires Ontologies Thesaurus
bull Scientific Papers
bull Social Media
bull Media Archives Newspapers Magazines
bull Websites
bull Live Streams Twitter Online News Product Reviews
bull VideosMoviesPictures
bull Books Belletristic Technical Literature
bull Wikipedia
bull Hand-crafted Datasets
bull etc etc
21Mark Cieliebak May 2015ZHAW
Data Provisioning
Data Collections
bull Linguistic Data Consortium
bull European Language Ressource Association
bull NISt
bull TIMIT
bull etc
Access Licensing
Open Data
bull Open Governmental Data OGD
bull Open Research Data ORD
bull Linked Open Data
Participation Guidelines
22Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
support
use
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
7Mark Cieliebak May 2015ZHAW
Foundation of Data Analytics
Memory 64000 Bytes
8589934592 Bytes
Performance 500000 FLOPS
FLOPS
Cost per GFLOP (1984) $4278000000
$008
33863000000000000
1960 2015
8Mark Cieliebak May 2015ZHAW
Foundation of Data Analytics
9Mark Cieliebak May 2015ZHAW
Foundations of Data Analytics
Faster Computers + More Data + Better Algorithms
Deep Learning
10Mark Cieliebak May 2015ZHAW
Predicted
Label
Machine Learning in a NutshellT
rain
ing
Ap
pli
ca
tio
n
11Mark Cieliebak May 2015ZHAW
Application Social Media Analysis
Sentiment Analysis
Topic Extraction
Trend Detection
Alerting
12Mark Cieliebak May 2015ZHAW
Twitter Facts
Data Access
bull Free Access with Search API
bull Commercial Access to Firehose Stream (all or 10)
13Mark Cieliebak May 2015ZHAW
Sentiment Analysis on Twitter
StackOverflow names Apple Swift
the worlds most loved programming
language httpwwwbloombergcomhellip
14Mark Cieliebak May 2015ZHAW
Sentiment Analysis Performance of
Commercial Tools (F1-Score)
0
01
02
03
04
05
06
07
Average of All Tools
Best Tool per Corpus
Overall Best Tool (Sentigem)
Experimantal Setup
9 commercial sentiment analysis tools evaluated on 7 public corpora with 28653 short texts
15Mark Cieliebak May 2015ZHAW
Take-Home Lesson
Sentiment Tools on short texts
achieve on average an
F1-Score of 51
16Mark Cieliebak May 2015ZHAW
Application Newspaper Segmentation
17Mark Cieliebak May 2015ZHAW
Application Sales Prediction
18Mark Cieliebak May 2015ZHAW
More Applications
Expert Match
Foundation Register Speaker Detection
FaceImage Recognition
19Mark Cieliebak May 2015ZHAW
What do Data Scientists Need
20Mark Cieliebak May 2015ZHAW
Data Sources
bull Experimental Data
bull Reference Datasets
bull Dictionaires Ontologies Thesaurus
bull Scientific Papers
bull Social Media
bull Media Archives Newspapers Magazines
bull Websites
bull Live Streams Twitter Online News Product Reviews
bull VideosMoviesPictures
bull Books Belletristic Technical Literature
bull Wikipedia
bull Hand-crafted Datasets
bull etc etc
21Mark Cieliebak May 2015ZHAW
Data Provisioning
Data Collections
bull Linguistic Data Consortium
bull European Language Ressource Association
bull NISt
bull TIMIT
bull etc
Access Licensing
Open Data
bull Open Governmental Data OGD
bull Open Research Data ORD
bull Linked Open Data
Participation Guidelines
22Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
support
use
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
8Mark Cieliebak May 2015ZHAW
Foundation of Data Analytics
9Mark Cieliebak May 2015ZHAW
Foundations of Data Analytics
Faster Computers + More Data + Better Algorithms
Deep Learning
10Mark Cieliebak May 2015ZHAW
Predicted
Label
Machine Learning in a NutshellT
rain
ing
Ap
pli
ca
tio
n
11Mark Cieliebak May 2015ZHAW
Application Social Media Analysis
Sentiment Analysis
Topic Extraction
Trend Detection
Alerting
12Mark Cieliebak May 2015ZHAW
Twitter Facts
Data Access
bull Free Access with Search API
bull Commercial Access to Firehose Stream (all or 10)
13Mark Cieliebak May 2015ZHAW
Sentiment Analysis on Twitter
StackOverflow names Apple Swift
the worlds most loved programming
language httpwwwbloombergcomhellip
14Mark Cieliebak May 2015ZHAW
Sentiment Analysis Performance of
Commercial Tools (F1-Score)
0
01
02
03
04
05
06
07
Average of All Tools
Best Tool per Corpus
Overall Best Tool (Sentigem)
Experimantal Setup
9 commercial sentiment analysis tools evaluated on 7 public corpora with 28653 short texts
15Mark Cieliebak May 2015ZHAW
Take-Home Lesson
Sentiment Tools on short texts
achieve on average an
F1-Score of 51
16Mark Cieliebak May 2015ZHAW
Application Newspaper Segmentation
17Mark Cieliebak May 2015ZHAW
Application Sales Prediction
18Mark Cieliebak May 2015ZHAW
More Applications
Expert Match
Foundation Register Speaker Detection
FaceImage Recognition
19Mark Cieliebak May 2015ZHAW
What do Data Scientists Need
20Mark Cieliebak May 2015ZHAW
Data Sources
bull Experimental Data
bull Reference Datasets
bull Dictionaires Ontologies Thesaurus
bull Scientific Papers
bull Social Media
bull Media Archives Newspapers Magazines
bull Websites
bull Live Streams Twitter Online News Product Reviews
bull VideosMoviesPictures
bull Books Belletristic Technical Literature
bull Wikipedia
bull Hand-crafted Datasets
bull etc etc
21Mark Cieliebak May 2015ZHAW
Data Provisioning
Data Collections
bull Linguistic Data Consortium
bull European Language Ressource Association
bull NISt
bull TIMIT
bull etc
Access Licensing
Open Data
bull Open Governmental Data OGD
bull Open Research Data ORD
bull Linked Open Data
Participation Guidelines
22Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
support
use
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
9Mark Cieliebak May 2015ZHAW
Foundations of Data Analytics
Faster Computers + More Data + Better Algorithms
Deep Learning
10Mark Cieliebak May 2015ZHAW
Predicted
Label
Machine Learning in a NutshellT
rain
ing
Ap
pli
ca
tio
n
11Mark Cieliebak May 2015ZHAW
Application Social Media Analysis
Sentiment Analysis
Topic Extraction
Trend Detection
Alerting
12Mark Cieliebak May 2015ZHAW
Twitter Facts
Data Access
bull Free Access with Search API
bull Commercial Access to Firehose Stream (all or 10)
13Mark Cieliebak May 2015ZHAW
Sentiment Analysis on Twitter
StackOverflow names Apple Swift
the worlds most loved programming
language httpwwwbloombergcomhellip
14Mark Cieliebak May 2015ZHAW
Sentiment Analysis Performance of
Commercial Tools (F1-Score)
0
01
02
03
04
05
06
07
Average of All Tools
Best Tool per Corpus
Overall Best Tool (Sentigem)
Experimantal Setup
9 commercial sentiment analysis tools evaluated on 7 public corpora with 28653 short texts
15Mark Cieliebak May 2015ZHAW
Take-Home Lesson
Sentiment Tools on short texts
achieve on average an
F1-Score of 51
16Mark Cieliebak May 2015ZHAW
Application Newspaper Segmentation
17Mark Cieliebak May 2015ZHAW
Application Sales Prediction
18Mark Cieliebak May 2015ZHAW
More Applications
Expert Match
Foundation Register Speaker Detection
FaceImage Recognition
19Mark Cieliebak May 2015ZHAW
What do Data Scientists Need
20Mark Cieliebak May 2015ZHAW
Data Sources
bull Experimental Data
bull Reference Datasets
bull Dictionaires Ontologies Thesaurus
bull Scientific Papers
bull Social Media
bull Media Archives Newspapers Magazines
bull Websites
bull Live Streams Twitter Online News Product Reviews
bull VideosMoviesPictures
bull Books Belletristic Technical Literature
bull Wikipedia
bull Hand-crafted Datasets
bull etc etc
21Mark Cieliebak May 2015ZHAW
Data Provisioning
Data Collections
bull Linguistic Data Consortium
bull European Language Ressource Association
bull NISt
bull TIMIT
bull etc
Access Licensing
Open Data
bull Open Governmental Data OGD
bull Open Research Data ORD
bull Linked Open Data
Participation Guidelines
22Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
support
use
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
10Mark Cieliebak May 2015ZHAW
Predicted
Label
Machine Learning in a NutshellT
rain
ing
Ap
pli
ca
tio
n
11Mark Cieliebak May 2015ZHAW
Application Social Media Analysis
Sentiment Analysis
Topic Extraction
Trend Detection
Alerting
12Mark Cieliebak May 2015ZHAW
Twitter Facts
Data Access
bull Free Access with Search API
bull Commercial Access to Firehose Stream (all or 10)
13Mark Cieliebak May 2015ZHAW
Sentiment Analysis on Twitter
StackOverflow names Apple Swift
the worlds most loved programming
language httpwwwbloombergcomhellip
14Mark Cieliebak May 2015ZHAW
Sentiment Analysis Performance of
Commercial Tools (F1-Score)
0
01
02
03
04
05
06
07
Average of All Tools
Best Tool per Corpus
Overall Best Tool (Sentigem)
Experimantal Setup
9 commercial sentiment analysis tools evaluated on 7 public corpora with 28653 short texts
15Mark Cieliebak May 2015ZHAW
Take-Home Lesson
Sentiment Tools on short texts
achieve on average an
F1-Score of 51
16Mark Cieliebak May 2015ZHAW
Application Newspaper Segmentation
17Mark Cieliebak May 2015ZHAW
Application Sales Prediction
18Mark Cieliebak May 2015ZHAW
More Applications
Expert Match
Foundation Register Speaker Detection
FaceImage Recognition
19Mark Cieliebak May 2015ZHAW
What do Data Scientists Need
20Mark Cieliebak May 2015ZHAW
Data Sources
bull Experimental Data
bull Reference Datasets
bull Dictionaires Ontologies Thesaurus
bull Scientific Papers
bull Social Media
bull Media Archives Newspapers Magazines
bull Websites
bull Live Streams Twitter Online News Product Reviews
bull VideosMoviesPictures
bull Books Belletristic Technical Literature
bull Wikipedia
bull Hand-crafted Datasets
bull etc etc
21Mark Cieliebak May 2015ZHAW
Data Provisioning
Data Collections
bull Linguistic Data Consortium
bull European Language Ressource Association
bull NISt
bull TIMIT
bull etc
Access Licensing
Open Data
bull Open Governmental Data OGD
bull Open Research Data ORD
bull Linked Open Data
Participation Guidelines
22Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
support
use
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
11Mark Cieliebak May 2015ZHAW
Application Social Media Analysis
Sentiment Analysis
Topic Extraction
Trend Detection
Alerting
12Mark Cieliebak May 2015ZHAW
Twitter Facts
Data Access
bull Free Access with Search API
bull Commercial Access to Firehose Stream (all or 10)
13Mark Cieliebak May 2015ZHAW
Sentiment Analysis on Twitter
StackOverflow names Apple Swift
the worlds most loved programming
language httpwwwbloombergcomhellip
14Mark Cieliebak May 2015ZHAW
Sentiment Analysis Performance of
Commercial Tools (F1-Score)
0
01
02
03
04
05
06
07
Average of All Tools
Best Tool per Corpus
Overall Best Tool (Sentigem)
Experimantal Setup
9 commercial sentiment analysis tools evaluated on 7 public corpora with 28653 short texts
15Mark Cieliebak May 2015ZHAW
Take-Home Lesson
Sentiment Tools on short texts
achieve on average an
F1-Score of 51
16Mark Cieliebak May 2015ZHAW
Application Newspaper Segmentation
17Mark Cieliebak May 2015ZHAW
Application Sales Prediction
18Mark Cieliebak May 2015ZHAW
More Applications
Expert Match
Foundation Register Speaker Detection
FaceImage Recognition
19Mark Cieliebak May 2015ZHAW
What do Data Scientists Need
20Mark Cieliebak May 2015ZHAW
Data Sources
bull Experimental Data
bull Reference Datasets
bull Dictionaires Ontologies Thesaurus
bull Scientific Papers
bull Social Media
bull Media Archives Newspapers Magazines
bull Websites
bull Live Streams Twitter Online News Product Reviews
bull VideosMoviesPictures
bull Books Belletristic Technical Literature
bull Wikipedia
bull Hand-crafted Datasets
bull etc etc
21Mark Cieliebak May 2015ZHAW
Data Provisioning
Data Collections
bull Linguistic Data Consortium
bull European Language Ressource Association
bull NISt
bull TIMIT
bull etc
Access Licensing
Open Data
bull Open Governmental Data OGD
bull Open Research Data ORD
bull Linked Open Data
Participation Guidelines
22Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
support
use
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
12Mark Cieliebak May 2015ZHAW
Twitter Facts
Data Access
bull Free Access with Search API
bull Commercial Access to Firehose Stream (all or 10)
13Mark Cieliebak May 2015ZHAW
Sentiment Analysis on Twitter
StackOverflow names Apple Swift
the worlds most loved programming
language httpwwwbloombergcomhellip
14Mark Cieliebak May 2015ZHAW
Sentiment Analysis Performance of
Commercial Tools (F1-Score)
0
01
02
03
04
05
06
07
Average of All Tools
Best Tool per Corpus
Overall Best Tool (Sentigem)
Experimantal Setup
9 commercial sentiment analysis tools evaluated on 7 public corpora with 28653 short texts
15Mark Cieliebak May 2015ZHAW
Take-Home Lesson
Sentiment Tools on short texts
achieve on average an
F1-Score of 51
16Mark Cieliebak May 2015ZHAW
Application Newspaper Segmentation
17Mark Cieliebak May 2015ZHAW
Application Sales Prediction
18Mark Cieliebak May 2015ZHAW
More Applications
Expert Match
Foundation Register Speaker Detection
FaceImage Recognition
19Mark Cieliebak May 2015ZHAW
What do Data Scientists Need
20Mark Cieliebak May 2015ZHAW
Data Sources
bull Experimental Data
bull Reference Datasets
bull Dictionaires Ontologies Thesaurus
bull Scientific Papers
bull Social Media
bull Media Archives Newspapers Magazines
bull Websites
bull Live Streams Twitter Online News Product Reviews
bull VideosMoviesPictures
bull Books Belletristic Technical Literature
bull Wikipedia
bull Hand-crafted Datasets
bull etc etc
21Mark Cieliebak May 2015ZHAW
Data Provisioning
Data Collections
bull Linguistic Data Consortium
bull European Language Ressource Association
bull NISt
bull TIMIT
bull etc
Access Licensing
Open Data
bull Open Governmental Data OGD
bull Open Research Data ORD
bull Linked Open Data
Participation Guidelines
22Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
support
use
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
13Mark Cieliebak May 2015ZHAW
Sentiment Analysis on Twitter
StackOverflow names Apple Swift
the worlds most loved programming
language httpwwwbloombergcomhellip
14Mark Cieliebak May 2015ZHAW
Sentiment Analysis Performance of
Commercial Tools (F1-Score)
0
01
02
03
04
05
06
07
Average of All Tools
Best Tool per Corpus
Overall Best Tool (Sentigem)
Experimantal Setup
9 commercial sentiment analysis tools evaluated on 7 public corpora with 28653 short texts
15Mark Cieliebak May 2015ZHAW
Take-Home Lesson
Sentiment Tools on short texts
achieve on average an
F1-Score of 51
16Mark Cieliebak May 2015ZHAW
Application Newspaper Segmentation
17Mark Cieliebak May 2015ZHAW
Application Sales Prediction
18Mark Cieliebak May 2015ZHAW
More Applications
Expert Match
Foundation Register Speaker Detection
FaceImage Recognition
19Mark Cieliebak May 2015ZHAW
What do Data Scientists Need
20Mark Cieliebak May 2015ZHAW
Data Sources
bull Experimental Data
bull Reference Datasets
bull Dictionaires Ontologies Thesaurus
bull Scientific Papers
bull Social Media
bull Media Archives Newspapers Magazines
bull Websites
bull Live Streams Twitter Online News Product Reviews
bull VideosMoviesPictures
bull Books Belletristic Technical Literature
bull Wikipedia
bull Hand-crafted Datasets
bull etc etc
21Mark Cieliebak May 2015ZHAW
Data Provisioning
Data Collections
bull Linguistic Data Consortium
bull European Language Ressource Association
bull NISt
bull TIMIT
bull etc
Access Licensing
Open Data
bull Open Governmental Data OGD
bull Open Research Data ORD
bull Linked Open Data
Participation Guidelines
22Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
support
use
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
14Mark Cieliebak May 2015ZHAW
Sentiment Analysis Performance of
Commercial Tools (F1-Score)
0
01
02
03
04
05
06
07
Average of All Tools
Best Tool per Corpus
Overall Best Tool (Sentigem)
Experimantal Setup
9 commercial sentiment analysis tools evaluated on 7 public corpora with 28653 short texts
15Mark Cieliebak May 2015ZHAW
Take-Home Lesson
Sentiment Tools on short texts
achieve on average an
F1-Score of 51
16Mark Cieliebak May 2015ZHAW
Application Newspaper Segmentation
17Mark Cieliebak May 2015ZHAW
Application Sales Prediction
18Mark Cieliebak May 2015ZHAW
More Applications
Expert Match
Foundation Register Speaker Detection
FaceImage Recognition
19Mark Cieliebak May 2015ZHAW
What do Data Scientists Need
20Mark Cieliebak May 2015ZHAW
Data Sources
bull Experimental Data
bull Reference Datasets
bull Dictionaires Ontologies Thesaurus
bull Scientific Papers
bull Social Media
bull Media Archives Newspapers Magazines
bull Websites
bull Live Streams Twitter Online News Product Reviews
bull VideosMoviesPictures
bull Books Belletristic Technical Literature
bull Wikipedia
bull Hand-crafted Datasets
bull etc etc
21Mark Cieliebak May 2015ZHAW
Data Provisioning
Data Collections
bull Linguistic Data Consortium
bull European Language Ressource Association
bull NISt
bull TIMIT
bull etc
Access Licensing
Open Data
bull Open Governmental Data OGD
bull Open Research Data ORD
bull Linked Open Data
Participation Guidelines
22Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
support
use
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
15Mark Cieliebak May 2015ZHAW
Take-Home Lesson
Sentiment Tools on short texts
achieve on average an
F1-Score of 51
16Mark Cieliebak May 2015ZHAW
Application Newspaper Segmentation
17Mark Cieliebak May 2015ZHAW
Application Sales Prediction
18Mark Cieliebak May 2015ZHAW
More Applications
Expert Match
Foundation Register Speaker Detection
FaceImage Recognition
19Mark Cieliebak May 2015ZHAW
What do Data Scientists Need
20Mark Cieliebak May 2015ZHAW
Data Sources
bull Experimental Data
bull Reference Datasets
bull Dictionaires Ontologies Thesaurus
bull Scientific Papers
bull Social Media
bull Media Archives Newspapers Magazines
bull Websites
bull Live Streams Twitter Online News Product Reviews
bull VideosMoviesPictures
bull Books Belletristic Technical Literature
bull Wikipedia
bull Hand-crafted Datasets
bull etc etc
21Mark Cieliebak May 2015ZHAW
Data Provisioning
Data Collections
bull Linguistic Data Consortium
bull European Language Ressource Association
bull NISt
bull TIMIT
bull etc
Access Licensing
Open Data
bull Open Governmental Data OGD
bull Open Research Data ORD
bull Linked Open Data
Participation Guidelines
22Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
support
use
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
16Mark Cieliebak May 2015ZHAW
Application Newspaper Segmentation
17Mark Cieliebak May 2015ZHAW
Application Sales Prediction
18Mark Cieliebak May 2015ZHAW
More Applications
Expert Match
Foundation Register Speaker Detection
FaceImage Recognition
19Mark Cieliebak May 2015ZHAW
What do Data Scientists Need
20Mark Cieliebak May 2015ZHAW
Data Sources
bull Experimental Data
bull Reference Datasets
bull Dictionaires Ontologies Thesaurus
bull Scientific Papers
bull Social Media
bull Media Archives Newspapers Magazines
bull Websites
bull Live Streams Twitter Online News Product Reviews
bull VideosMoviesPictures
bull Books Belletristic Technical Literature
bull Wikipedia
bull Hand-crafted Datasets
bull etc etc
21Mark Cieliebak May 2015ZHAW
Data Provisioning
Data Collections
bull Linguistic Data Consortium
bull European Language Ressource Association
bull NISt
bull TIMIT
bull etc
Access Licensing
Open Data
bull Open Governmental Data OGD
bull Open Research Data ORD
bull Linked Open Data
Participation Guidelines
22Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
support
use
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
17Mark Cieliebak May 2015ZHAW
Application Sales Prediction
18Mark Cieliebak May 2015ZHAW
More Applications
Expert Match
Foundation Register Speaker Detection
FaceImage Recognition
19Mark Cieliebak May 2015ZHAW
What do Data Scientists Need
20Mark Cieliebak May 2015ZHAW
Data Sources
bull Experimental Data
bull Reference Datasets
bull Dictionaires Ontologies Thesaurus
bull Scientific Papers
bull Social Media
bull Media Archives Newspapers Magazines
bull Websites
bull Live Streams Twitter Online News Product Reviews
bull VideosMoviesPictures
bull Books Belletristic Technical Literature
bull Wikipedia
bull Hand-crafted Datasets
bull etc etc
21Mark Cieliebak May 2015ZHAW
Data Provisioning
Data Collections
bull Linguistic Data Consortium
bull European Language Ressource Association
bull NISt
bull TIMIT
bull etc
Access Licensing
Open Data
bull Open Governmental Data OGD
bull Open Research Data ORD
bull Linked Open Data
Participation Guidelines
22Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
support
use
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
18Mark Cieliebak May 2015ZHAW
More Applications
Expert Match
Foundation Register Speaker Detection
FaceImage Recognition
19Mark Cieliebak May 2015ZHAW
What do Data Scientists Need
20Mark Cieliebak May 2015ZHAW
Data Sources
bull Experimental Data
bull Reference Datasets
bull Dictionaires Ontologies Thesaurus
bull Scientific Papers
bull Social Media
bull Media Archives Newspapers Magazines
bull Websites
bull Live Streams Twitter Online News Product Reviews
bull VideosMoviesPictures
bull Books Belletristic Technical Literature
bull Wikipedia
bull Hand-crafted Datasets
bull etc etc
21Mark Cieliebak May 2015ZHAW
Data Provisioning
Data Collections
bull Linguistic Data Consortium
bull European Language Ressource Association
bull NISt
bull TIMIT
bull etc
Access Licensing
Open Data
bull Open Governmental Data OGD
bull Open Research Data ORD
bull Linked Open Data
Participation Guidelines
22Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
support
use
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
19Mark Cieliebak May 2015ZHAW
What do Data Scientists Need
20Mark Cieliebak May 2015ZHAW
Data Sources
bull Experimental Data
bull Reference Datasets
bull Dictionaires Ontologies Thesaurus
bull Scientific Papers
bull Social Media
bull Media Archives Newspapers Magazines
bull Websites
bull Live Streams Twitter Online News Product Reviews
bull VideosMoviesPictures
bull Books Belletristic Technical Literature
bull Wikipedia
bull Hand-crafted Datasets
bull etc etc
21Mark Cieliebak May 2015ZHAW
Data Provisioning
Data Collections
bull Linguistic Data Consortium
bull European Language Ressource Association
bull NISt
bull TIMIT
bull etc
Access Licensing
Open Data
bull Open Governmental Data OGD
bull Open Research Data ORD
bull Linked Open Data
Participation Guidelines
22Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
support
use
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
20Mark Cieliebak May 2015ZHAW
Data Sources
bull Experimental Data
bull Reference Datasets
bull Dictionaires Ontologies Thesaurus
bull Scientific Papers
bull Social Media
bull Media Archives Newspapers Magazines
bull Websites
bull Live Streams Twitter Online News Product Reviews
bull VideosMoviesPictures
bull Books Belletristic Technical Literature
bull Wikipedia
bull Hand-crafted Datasets
bull etc etc
21Mark Cieliebak May 2015ZHAW
Data Provisioning
Data Collections
bull Linguistic Data Consortium
bull European Language Ressource Association
bull NISt
bull TIMIT
bull etc
Access Licensing
Open Data
bull Open Governmental Data OGD
bull Open Research Data ORD
bull Linked Open Data
Participation Guidelines
22Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
support
use
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
21Mark Cieliebak May 2015ZHAW
Data Provisioning
Data Collections
bull Linguistic Data Consortium
bull European Language Ressource Association
bull NISt
bull TIMIT
bull etc
Access Licensing
Open Data
bull Open Governmental Data OGD
bull Open Research Data ORD
bull Linked Open Data
Participation Guidelines
22Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
support
use
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
22Mark Cieliebak May 2015ZHAW
Data Scientist
Library
Data
analyze
support
use
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
23Mark Cieliebak May 2015ZHAW
Digitizing
Make Information Accessible
bull Extract Text Images Charts Tables
bull Categorize
bull Search
bull Browse
bull (Summarize)
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
24Mark Cieliebak May 2015ZHAW
Data Integration
Combining data from different sources is very time consuming
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
25Mark Cieliebak May 2015ZHAW
SODES Automatic Data Integration
SearchAutomaticIntegration
Data Intake Preview Download
Data enters SODES viabull Linking to eg CKANbull APIbull Crawlingbull User upload
Semantic Context Comprehensionbull Matching of columnsbull Data Quality Improvements
Content based search onbull Full text of databull Full text of meta databull Descriptions of data sets
Easy data exploration enabled bybull All integrable search results in one tablebull Statistical standard plots amp measures
bull Export to variousstandard formats
bull Enables analytics in specialized tools of choice
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
26Mark Cieliebak May 2015ZHAW
Talk in Short
bull Text and Data Analytics is successful
bull Researcher need access to data
bull Data Scientists have powerful tools
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel
27Mark Cieliebak May 2015ZHAW
Thank You
Mark Cieliebak
ZHAW ndash Institute of Applied Information Technology (InIT)
Winterthur Switzerland
Email cielzhawch Website wwwzhawch~ciel