The Role of Data in IS Research
-
Upload
frank-hopfgartner -
Category
Data & Analytics
-
view
304 -
download
0
Transcript of The Role of Data in IS Research
![Page 1: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/1.jpg)
Click to edit Master title style
The Role of Data in IS Research
Frank Hopfgartner
University of Glasgow
@OkapiBM25
![Page 2: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/2.jpg)
Click to edit Master title styleQuestion
Do you use a
dataset for your
research?
![Page 3: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/3.jpg)
Click to edit Master title styleIntended Learning Outcome
• By the end of this session, you will be able to
– Explain the need for datasets for scientific research
– List components that comprise test collections
– Identify appropriate datasets to answer research hypotheses
– Create your own test collections
![Page 4: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/4.jpg)
Click to edit Master title styleOutline
• Importance of Data
• Getting Data
• Using Datasets for IS Research
![Page 5: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/5.jpg)
Click to edit Master title styleWhy do we use data?
Because it helps us
to understand our
world
![Page 6: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/6.jpg)
Click to edit Master title styleExample:
Ngram Viewer
Source: https://books.google.com/ngrams
![Page 7: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/7.jpg)
Click to edit Master title styleExample:
Online publishing
D. Corney, D. Albakour, M. Martinez, S. Moussa
“What do a Million News Articles look like?” in Proc. NewsIR’16, pp. 42-47, 2016.
Sampling from over 93,000 different news sources recorded in September 2015
Large-scale main News outlets
Single-author Blogs
![Page 8: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/8.jpg)
Click to edit Master title styleSummarising:
Types of data
Quantitative & Qualitative
Numeric and Textual
Comparison (like with like)
Context
Point-in-time
Longitudinal (series and interval)
![Page 9: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/9.jpg)
Click to edit Master title styleOutline
• Importance of Data
• Getting Data
• Using Datasets for IS Research
![Page 10: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/10.jpg)
Click to edit Master title styleExample:
Opening UK Government
Source: https://data.gov.uk/
![Page 11: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/11.jpg)
Click to edit Master title styleExample:
UK Data Archive
Over 5,000 data
collections
Largely economic
and social
Founded in 1967
Office of National
Statistics
Medical Research
Council
http://www.data-archive.ac.uk/
![Page 12: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/12.jpg)
Click to edit Master title styleExample:
UK Data Service
https://www.ukdataservice.ac.uk
large-scale
government surveys
international
macrodata
business microdata
qualitative studies
census data from
1971 to 2011
![Page 13: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/13.jpg)
Click to edit Master title styleNon-Public Data
Example: Google Trends
https://www.google.com/trends/home/all/GB
![Page 14: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/14.jpg)
Click to edit Master title styleQuestion
But what if I want to
analyse non-public
data?
![Page 15: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/15.jpg)
Click to edit Master title styleSome people just hack…
http://www.theguardian.com/news/2016/apr/03/what-you-need-to-know-about-the-panama-papers
Disclaimer: This is not an appeal to perform any illegal activities.
![Page 16: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/16.jpg)
Click to edit Master title styleCreate your own data
• Record data, e.g.,
– Log files of users using information access systems
– Sensor records
– Digitise documents (accepting copyright)
– …
![Page 17: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/17.jpg)
Click to edit Master title styleExample:
Campus wide IPTV provider
• Campus wide IPTV provider
• Live and VoD content
• 16 genres
• 33 channels
• Over 7000 different programme names
• Over 500 unique users
J. Yuan, F. Sikrivaya, F. Hopfgartner, A. Lommatzsch, M. Mu. Context-Aware LDA: Balancing Relevance and Diversity in TV Content
Recommenders. In Proc. RecSysTV workshop, Vienna, Austria, 2015.
![Page 18: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/18.jpg)
Click to edit Master title style
1
2
3
4
5
6
7
0246810121416182022
ARTS
CHILDRENS
COMEDY
DRAMA
ENTERTAINMENT
FACTUAL
FILM
LEARNING
LIFESTYLE
MUSIC
NEWS
NULL
RELIGIONANDETHICS
SPORT
SPORTS
WEATHER
day of w eek
Category Distribution
time of day
cate
gories
categories chosen count
20
40
60
80
100
120
140
Example:
Log user interaction data
J. Yuan, F. Sikrivaya, F. Hopfgartner, A. Lommatzsch, M. Mu. Context-Aware LDA: Balancing Relevance and Diversity in TV Content
Recommenders. In Proc. RecSysTV workshop, Vienna, Austria, 2015.
![Page 19: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/19.jpg)
Click to edit Master title styleExample:
Video retrieval platform
F. Hopfgartner, D. Scott, H. Wang, Y. Yang, Z. Zhang, M. Zhou, C. gurrin. Helping the Helpers: How Video Retrieval Can Assist
Special Interest Groups. In Proc. MMM'13: 19th International Conference on Multimedia Modeling, pp. 493-495, 2013.
![Page 20: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/20.jpg)
Click to edit Master title style
F. Hopfgartner and J. M. Jose. Semantic User Profiling Techniques for personalised multimedia recommendation. Multimedia Systems 14(4-5):255-
274, 2010.
F. Hopfgartner and J. M. Jose. An experimental evaluation of ontology-based user profiles. Multimedia Tools and Applications 73(2):1029-1051,
2014.
![Page 21: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/21.jpg)
Click to edit Master title styleSummarising:
What do I need to consider?
Documentation
Terms of deposit
Permissions and re-use
Software
Methodology
Time
Place
Sampling
Data collection
Editorial control
Classification
Coding
21
![Page 22: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/22.jpg)
Click to edit Master title styleOutline
• Importance of Data
• Getting Data
• Using Datasets for IS Research
![Page 23: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/23.jpg)
Click to edit Master title styleUse Case: Evaluation of
Information Access Systems
Information Access System
Input
Output
![Page 24: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/24.jpg)
Click to edit Master title styleExamples:
Web Search Engines
![Page 25: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/25.jpg)
Click to edit Master title styleExample:
Social Media Search Engines
![Page 26: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/26.jpg)
Click to edit Master title styleExample:
Product Search Engines
26
![Page 27: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/27.jpg)
Click to edit Master title styleExamples:
Multimedia Search Engines
![Page 28: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/28.jpg)
Click to edit Master title styleExample:
Libraries
![Page 29: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/29.jpg)
Click to edit Master title styleHow do we evaluate
information access systems?
Document
collection
Topic
set
Relevance
assessments
Test colle
ction
Document
collection
But how can we compare with state-of-the-art?
SystemB
SystemA
![Page 30: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/30.jpg)
Click to edit Master title styleEvaluation Campaigns
TRECCLEF
FIRE
NTCIR
Common dataset Pre-defined tasks Ground truth Evaluation protocol Evaluation metrics
![Page 31: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/31.jpg)
Click to edit Master title styleFocus on different domains
Microblogging
Ad-hoc and Web Search
Multimedia
Federated Web Search
XML Retrieval
Information Access in the Legal Domain
Document Similarity
…
![Page 32: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/32.jpg)
Click to edit Master title styleExample projects
![Page 33: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/33.jpg)
Click to edit Master title styleCLEF InitativeSo
urc
e: h
ttp
://w
ww
.isic
al.a
c.in
/~fi
re/2
01
3/s
lide
s/o
the
r_cl
ef_f
ire1
3.p
df
![Page 34: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/34.jpg)
Click to edit Master title styleCLEF Tracks
Source: http://www.clef-initiative.eu/track/series
eHealth
ImageCLEF
LifeCLEF
Living Labs for IR (LL4IR)
News Recommendation Evaluation Lab (NEWREEL)
Uncovering Plagiarism, Authorship and Social Software Misuse (PAN)
Social Book Search (SBS)
CL
EF
’16
![Page 35: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/35.jpg)
Click to edit Master title style
In CLEF NewsREEL, participants can develop stream-based news
recommendation algorithms and have them benchmarked (a) online by
millions of users over the period of a few months in a living lab, and (b) offline
by simulating a live stream.
NEWSREEL
F. Hopfgartner, T. Brodt, J. Seiler, B. Kille, A. Lommatzsch, M. Larson, R. Turrin, A. Sereny
“Benchmarking News Recommendations: The CLEF NewsREEL Use Case,” in SIGIR Forum, 49(2):129-136, 2015
![Page 36: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/36.jpg)
Click to edit Master title styleExample: News Articles
Source (Image): T. Brodt of plista.com
![Page 37: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/37.jpg)
Click to edit Master title style
Profit = Clicks on recommendations
Benchmarking metric: Click-Through-
Rate
Request
article
Request
article
Request
recommendation
Request
recommendation
![Page 38: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/38.jpg)
Click to edit Master title styleDataset
• Traffic and content
updates of nine German-
language news content
provider websites
• Traffic: Reading article,
clicking on
recommendations
• Updates: adding and
updating news articles
B. Kille, F. Hopfgartner, T. Brodt, T. Heintz
“The plista Dataset” in Proc. NRS'13: International Workshop and Challenge on News Recommender Systems, Hong Kong, China, pp. 16-23, 2013.
![Page 39: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/39.jpg)
Click to edit Master title styleEvaluation using offline
dataset
Idomaar
request
articlessimulate
stream
![Page 40: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/40.jpg)
Click to edit Master title styleExample results
B. Kille, A. Lommatzsch, R. Turrin, A. Sereny, M. Larson, T. Brodt, J. Seiler, F. Hopfgartner
“Overview of CLEF NewsREEL 2015: News Recommendation Evaluation Lab,” in Working Notes of CLEF 2015, Toulouse, France, 2015.
![Page 41: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/41.jpg)
Click to edit Master title styleExample projects
![Page 42: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/42.jpg)
Click to edit Master title styleNTCIRS
ourc
e: H
ideo
Jo
ho
![Page 43: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/43.jpg)
Click to edit Master title styleNTCIR-12 TasksN
TC
IR-1
2
Second round:
Search-Intent Mining
Mobile Click
Temporal Information Access
Spoken Query & Spoken Document Retrieval
QA Lab for Entrance Exam
First round:
Medical NLP for Clinical Documents
Personal Lifelog Access & Retrieval
Short Text Conversation
![Page 44: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/44.jpg)
Click to edit Master title style
Encourage research advances in organising and retrieving from lifelog data.
LifeLog @ NTCIR-12
![Page 45: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/45.jpg)
Click to edit Master title styleWhat is The Quantified Self?
The Quantified Self is about obtaining self-knowledge through
self-tracking.
![Page 46: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/46.jpg)
Click to edit Master title styleWhat is The Quantified Self?
Self-tracking is also referred to as lifelogging, self-analysis,
or self-hacking.
![Page 47: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/47.jpg)
Click to edit Master title styleExample: Visual Lifelogging
![Page 48: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/48.jpg)
Click to edit Master title styleVisual Lifelog of a day
2,000 pictures a day
Slide: Cathal Gurrin
![Page 49: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/49.jpg)
Click to edit Master title styleLifelogging Challenges
The challenges are how to sense the person, their actions, their life and make it accessible using appropriate interfaces, search, recommendation engines and visual/aural feedback. Further, exploiting the lifelog to identify context for adaptive information services.
Source (Graphic): DAI-Labor, Berlin
![Page 50: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/50.jpg)
Click to edit Master title styleMultimodal dataset with
information needs
Created by three individuals over
10+ days
TE
ST
CO
LL
EC
TIO
N
18.18GB 88,124 images Accompanying output of 1,000
concepts (825MB) Data processed pre-release
(removal of personal content; face blurring, translation of concepts)
Detailed user queries andjudgments generated by the lifelogging data gatherers
C. Gurrin, H. Joho, F. Hopfgartner, L. Zhou, R. Albatal
“NTCIR-Lifelog: The First Test Collection for Lifelog Research”, in Proc. SIGIR'16: ACM International Conference on Information Retrieval, Pisa, Italy, to appear.
![Page 51: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/51.jpg)
Click to edit Master title style
Evaluate different methods of
retrieval and access.
TasksT1
: LI
FELO
G S
EMA
NTI
C A
CC
ESS
(LSA
T)
T2:
LIFE
LOG
IN
SIG
HT
Models the retrieval need from lifelogs (Known-Item Search)
Retrieve N segments that match information need
Interactive or Automatic participation
Interactive: Time limit for fair and comparative evaluation in an interactive system with users
Automatic: Fully-automatic retrieval system. Automated query processing
Models the need for reflection over lifelog data
Exploratory task, the aim is to:
encourage broad participation
novel methods to visualise and explore lifelogs
Same data as LSAT task
Presented via demo/poster.
![Page 52: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/52.jpg)
Click to edit Master title styleTask 1: Lifelog Semantic
Access
Find the moment(s) where I
use my coffee machine.
Find the moment(s) where I am in the kitchen
Find the moment(s) where I
am playing with my phone.
Find the moment(s) where I
am preparing breakfast.
![Page 53: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/53.jpg)
Click to edit Master title styleTask 2: Lifelog Insight Task
Provide insights on the time I spend taking breakfast.
Provide insights on the time I spend driving to work.
Provide insights on the time I spend reading a paper.
Provide insights on the time I spend working on the
computer.
![Page 54: The Role of Data in IS Research](https://reader033.fdocuments.us/reader033/viewer/2022052606/58f299411a28ab4f738b457d/html5/thumbnails/54.jpg)
Click to edit Master title styleFinal thoughts
• Data plays an essential role in scientific research since it is
used to prove or disprove a hypothesis
• You are now familiar with various sources where you can
get datasets that might be useful for your own research
• When selecting data, question its credibility, e.g., is it
biased? Can it be used to support your hypotheses?
• Consider accessibility of the data you want to analyse. Are
you allowed to use it? Can others (e.g., other
researchers?) access the data?