What is Webometrics?
Mike ThelwallStatistical Cybermetrics Research Group
University of Wolverhampton, UK
Virtual Knowledge Studio (VKS)
Information Studies
1. Introduction
□Webometrics is concerned with gathering data on and measuring aspects of the Web□web sites□web pages□hyperlinks□web search engine results□YouTube video commenter networks□MySpace Friend networks
□…for very varied social science purposes
New problems: Web-based phenomena
□Webometrics can be applied to understanding web-based phenomena□Why do web sites interlink?□Which web sites interlink?□What interlinking patterns exist?□What topics are frequently blogged
about?
Old problems: Offline phenomena reflected online
□Some offline phenomena have measurable online reflections□International communication□Inter-university collaboration□University-business collaboration□The impact or spread of ideas□Public opinion
2. ExamplesBlog searching - blogpulse.com
Example: Identifying and tracking public science concerns
in blogsOver 100,000 Blogs and other sources tracked
daily via RSS feedsObjective: to identify and track public
concerns about scienceE.g., “Schiavo” identified and tracked as
potential public science concern
Example: The online impact of research groups (NetReAct)
Normalised linking, smallest countries removed
Geopoliticalconnected
SwedenFinland
Norway
UK
Germany
Austria Switzerland
Poland
Italy
Belgium
Spain
France
NL
Example:Links betweenEU universities
International biofuels research network
Example: MySpace age profiles
percentage of profiles containing swearing
moderate strong very strong sample size
US males 16-19 10% 47% 2% 1,530
US females 16-19 11% 38% 2% 1,287
UK males 16-19 33% 33% 8% 171
UK females 16-19 18% 38% 3% 130
(typical sample size 20-148 for non-web swearing research)
emphatic adverb/adjective OR adverbial booster OR premodifying intensifying negative adjective
(36% of swearing)
□and we r guna go to town again n make a ryt fuckin nyt of it again lol
□see look i'm fucking commenting u back□lol and stop fucking tickleing me!! □Thanks for the party last night it was fucking
good and you are great hosts. □That 50's rock and roll weekender was fucking
mint! □Fuckin my space, my arse □1/2 d ppl cudnt even speak fuckin english! □yeah so me and sarah broke up and
everythings fucking shit
YouTube – Video poster ages
YouTubefriend network
Online impact - Keywords in web pages mentioning IWRM
Data Gathering/Processing Tools
□Blogpulse.com – blog network diagrams
□LexiURL Searcher – links, web text, YouTube, Flickr, Technorati
□Issue Crawler, Google TouchGraph - links
Discussion points for online data
□ Validity – is the underlying meaning of the text/video/picture readily apparent to the researcher?□ Possibly not to any great degree for teenagers’ MySpace
comments or very personal YouTube videos
□ Reliability –are search engines accurate/good at returning the correct results?□ Google blog search shows unreliability – very variable
over time□ Researchers can triangulate different similar search
engines or over time to test reliability
Discussion points for online data
□Coverage – to what extent is all the phenomena of interest covered by the source (e.g., search engine) used?
□Sample bias – are certain types of people over-represented? (e.g., the more literate, the more vocal, the more politically active, youth, educated, creative types…)
Summary
□The web contains a wide variety of interesting web and “web 2.0” content posted by many different people in many different formats
□Webometric methods can give insights into this data
Books
□Thelwall, M. (2009). Introduction to webometrics: Quantitative web research for the social sciences. New York: Morgan & Claypool.
□Rogers, R. (2005). Information politics on the Web. Massachusetts: MIT Press.
□ http://lexiurl.wlv.ac.uk http://webometrics.wlv.ac.uk http://www.issuecrawler.net
Important considerations
□Data accuracy□Data cleaning□Context to help interpret results□Report results carefully
Example: Analysis of the accuracy of search engine
results
Live Search results analysis
Top Related