Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian...

25
Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre Bonn, Germany euroCRIS IR Workshop, November 9, 2006

Transcript of Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian...

Page 1: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

Data Integration in Current Research Information Systems

Integration vs. Aggregation

Maximilian Stempfhuber

GESIS / IZ Social Science Information Centre

Bonn, Germany

euroCRIS IR Workshop, November 9, 2006

Page 2: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

Topics

• What users want

• Current information landscape in Germany

• Aggregation vs. Integration: Dealing with heterogeneity

• Model for integrating decentralized and heterogeneous information

• Focus: Semantic level

• Integrating entities

• Coping with sustainability

2

Page 3: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

Looking at the Scientific User

• Spend 0,5 days/week searching for information

• Most frequently used information sources

• Journals (73%)

• Internet search engines (71%)

• Books (67%)

• Personal / informal communication (52%)

• Scientific portals / subject gateways (39%)

• Big differences between disciplinesBoekhorst et al. 2003, Poll 2004

Why are Internet search engines preferred to dedicated (research) information systems?

3

Page 4: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

What Scientific Users Want

• Specialized portals (deep indexing, integration)

• Interdisciplinary links (cluster search)

• Intelligent integration (all types of information)

• Quality („no waste“, role of search engines?)

• Quantity + relevance (but no information overflow)

• Direct access („now-or-never“, reference + source)

• Communication (invisible colleges)

In line with models from information science

Confirms results from recent surveys

Boekhorst et al. 2003, Poll 2004, IMAC 2002, RSLG 2002, Binder et al. 2001, Stahl et al. 1998, WWW Search Engines: Machill & Welp 2003

4

Page 5: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

Demands of Other Types of Users (Examples)

University or State level

• Overview over all scholars / research units

• Overview over all research projects

• Overview over publications, internal/external co- operations, funding received, …

• Input by users, quality assurance by research officers, automatic reporting, benchmarking, visibility of research, data exchange, …

Federal level

• Research administration

• Benchmarking / rating / ranking of instruments, programs, research organizations and disciplines

5

Page 6: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

Consequences and difficulties for Building a CRIS

• Different types of information needed (e.g. research units, persons, projects, publications, datasets, co-operations)

• Only parts of the information produced in-house

• Information produced by different groups of people (researchers, administration, funding agency/reviewers)

• Large amounts of data must be externally acquired (e.g. other institutes, publishers, harvesting)

• Data is of different structure and quality, difficult to convert / analyze at a very detailed level; sometimes modification of data not allowed

• Not all data is visible to all users

• Different demands for information access and use

Difficult to convert to a standardized data model 6

Page 7: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

Heterogeneity on the information landscape

JK 7

Page 8: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

Current Information Landscape in Germany (extract)

National Central libraries National research collection system (SSG)and Virtual Libraries (funding: DFG)

Information networks(funding: BMBF)

Research institutes

8

Page 9: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

Building a national CRIS by Aggregation (vascoda.de)

www.vascoda.de 9

Page 10: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

Aggregation – Heterogeneity as a Challenge

• Data types

• Indexing languages

• Metadata schemas

• User interfaces

• Technical interfaces

• Natural languages, …

Heterogeneous

10

Page 11: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

Features of Aggregation

• Single point of access to information

• Standardized functions applied to all information

• Features reflect least common denominator

• Enforced standards

• Remaining differences ignored (or lead to exclusion)

• Information entities not connected

Meta search (if distributed) Data Warehouse (if centralized)

11

Page 12: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

Features of Integration

• Single point of access to information

• Source-specific functions available

• Features reflect different demands

• Enforced standards

• Remaining differences are treated

• Information entities tightly connected

© IBM

Model

12

Page 13: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

SOWIPORT – CRIS for the Social Sciences

Thematische Dokumentationen

sowi ReiheSoFid

13

Page 14: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

SOWIPORT – Core Content

• GESIS

• Literature references and full text documents

• Projects, institutes, journals, WWW resources, …

• Empirical data

• Partners

• …

• Library catalogues

• Open Access journals

• Topic-specific electronic publications

• Deutsche Forschungsgemeinschaft (DFG)

• National licenses to electronic resources

14

Page 15: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

Layer 3

Google Scholar, MS Academic Search, Scirus, …

…Layer 2

Homepages of SOFO-Institutes, Harvesting (Grey literature), …

Theoretical Foundation: Layer Model

Layer 1

Databases, OA Repository (Self archiving), Harvesting (Metadata), Reviews, Wikipedia, …

Core

SOLIS, FORIS, SoLit, CSA, …

SOWIPORT – Information Architecture 1/2

SOWIPORT-Partners

intellectual (CC) Heterogeneity statistical, …

social sciences Content scientificsystematic Content indexing not systematic

15

Page 16: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

SOWIPORT – Information Architecture 2/2

Databases Publications

Documentation unit (Service)Publication

16

Page 17: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

SOWIPORT – Semantic Integration

DZI

IZ

CSA

Cross-Concordancesbetween Thesauri

Query Transformation

Aktionsforschung

SOLIS

Aktionsforschung

DZI SoLit

Handlungs-forschung

CSA

Action Research

Relevance Ranking

17

Page 18: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

Treating Heterogeneity Between Indexing Vocabularies

intellectual

1 : n

a)

statisticalsearch

1 : n

c)

n : m

statistical, parallel corpus

b)

=

18

Page 19: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

SOWIPORT – Semantic Integration of Core Content

DZI

IZ

CSA

Cross-concordancesbetween thesauri

• Methodology for terminology mappings

• intellectual• statistical• deductive

• Mapping between vocabularies• bilateral („pure“ model)• central vocabulary (efficency)

Thesauri in SOWIPORT

• IZ (SOLIS, FORIS, WZB OPAC)

• DZI (SoLit)

• DZA (GeroLit)

• SWD (SGG OPAC)

• FES (FES OPAC)

• ASSIA (Applied Social Sciences Index and Abstracts)

• PEI (Physical Education Index)

• WPSA (Worldwide Political Science Abstracts)

• CSA (Soc. Abstr., Soc. Serv. Abstr.)

• MADIERA (Surveys)

• EuroThes (IBLK OPAC)

• FIS Bildung

• APA (Psyndex)

• BiSP (SpoLit, SpoFor, SpoMedia) 19

Page 20: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

vascoda: Context for Connecting Disciplines

Pedagogics Psychology

Economics Sports, …

MedicinCross-concordanzesbetweeen 12 thesauri

20

Page 21: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

SOWIPORT – Structural, Local Integration (Core)

Partners‘databases(RDBMS, Allegro, …)

sowiport-XML-Schema

DBClear

Services:Terminology service,

Personalization,Authentication, …

Indexing / Retrieval

21

Page 22: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

Integrated Search

Self Archiving in SOWIPORT

Pro

duct

Cat

alog

ue

SOLIS

Literature Search

CommunicationHomepages CV

+Publikations

(Self archiving) Full text

Repository

MetadataSOFO

Affiliation

• Initial motivation: WR Evaluation

• Sustainability: Incentives 22

Page 23: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

Self Archiving and Evaluation

SOLIS + CSA

DBClear

WR suppliesnames of universitiesand researchers

• Retrieval of publications

• Quality control

Review and additions byresearchers

Evaluation by WR

Perspective:

• Basis for scholars‘ homepages / Who-is-who

• Self archiving in OpenAccess Repository

Transfer of metadata to the SOWIPORT core23

Page 24: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

Reflecting Integration at the UI Level

24

Page 25: Data Integration in Current Research Information Systems Integration vs. Aggregation Maximilian Stempfhuber GESIS / IZ Social Science Information Centre.

Conclusions

• Integration goes well beyond aggregation• Shift from data orientation to an information use perspective necessary• Challenges

• Deal with heterogeneity at different levels• Integrate primary data with publications, …• Organize information sharing / access / sustainability

• Emerging infrastructures allow for integration• Licensing and access issues are still a problem

Thank You!

Dr. Maximilian Stempfhuber

[email protected]

www.gesis.org/IZ 25