Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical...

Post on 21-Dec-2015

214 views 0 download

Tags:

Transcript of Using Search Engines and Web Crawlers in Social Science Research Mike Thelwall Head, Statistical...

Using Search Engines and Web Crawlers in Social

Science Research

Mike Thelwall

Head, Statistical Cybermetrics Research Group

University of Wolverhampton, UK

http://linkanalysis.wlv.ac.ukRC33 August 2004

Link Analysis in Social Science Research Use to study web phenomena

E.g. NGO web site interlinking E.g. university web site interlinking

Use to study offline phenomena with web aspects E.g. scholarly communication E.g. the perception of news events

The web is a free, accessible massive data source for information about many aspects of life

What use is hyperlink data to qualitative researchers?

Part of a mixed methodology Numbers to back up theories To obtain samples of types of Web pages for

qualitative analyses Background information on how the Web

is used

Quick example 1:

UK universityinterlinkingwith geographicclusters indicated

Quick example 2:

Asia-Pacific university interlinking.

{Research with Alastair Smith, VUW, NZ}

Quick example 3:

Geographic interlinking trends for UK universities.

Talk overview A social science approach for link analysis Data collection with commercial search

engines Data collection and analysis with

SocSciBot

A social science approach for link analysis 1: Preliminary steps1. Formulate an appropriate research question,

taking into account existing knowledge of web structure

2. Conduct a pilot study3. Identify web pages or sites that are appropriate to

address a research question4. Collect link data from a commercial search

engine or a personal crawler taking appropriate safeguards to ensure that the

results obtained are accurate

A social science approach for link analysis 2: Validation

5. Partially validate the link count results through correlation tests

6. Partially validate the interpretation of the results through a link classification exercise or web author interviews

A social science approach for link analysis 3: Reporting8. Report results with an interpretation

consistent with link classification exercise include either a detailed description of the

classification or exemplars to illustrate the categories

9. Report the limitations of the study and parameters used in data collection and processing

Link data from commercial search engines

Commercial search engines can give information about the existence of links in the web Can be used for data collection Advanced interfaces are usually needed, or

special commands

Google Can find all links to a given web page with

the link: command E.g. link:http://www.siswo.uva.nl/rc33/

Yahoo! site-specific searches Yahoo! allows searching for links between

pairs of web sites/web spaces E.g. linkdomain:db.dk +site:ac.uk returns

web pages in the ac.uk domain that link to the db.dk site

…ac.uk/… …db.dk/…

SocSciBot Personal crawler for link research Available free at socscibot.wlv.ac.uk Crawls sets of web sites and analyses the

links between them, producing: Link lists Link counts Network diagrams

Reprise: Link Analysis in Social Science Research Use to study web phenomena

E.g. NGO web site interlinking E.g. university web site interlinking

Use to study offline phenomena with web aspects E.g. scholarly communication E.g. the perception of news events

The web is a free, accessible massive data source for information about many aspects of life

But don’t forget the need for validation!