Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010...

81
Mapping social, political, and scientific landscape using webometrics method Asso. Prof. Han Woo PARK Department of Media & Communication YeungNam University 214-1 Dae-dong, Gyeongsan-si, Gyeongsangbuk-do 712-749 Republic of Korea [email protected] http://www.hanpark.net http://english-webometrics.yu.ac.kr http://asia-triplehelix.org Thanks to my colleagues and students at the WWI. Virtual Knowledge Studio (VK •Invited speech, Department of Media & Communication, City University of Hong Kong, 29 March 201 •(Topic: Mapping social, political, and scientific landscape using webometric method)

Transcript of Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010...

Page 1: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Mapping social, political, and scientific landscape using webometrics method

Asso. Prof. Han Woo PARKDepartment of Media & CommunicationYeungNam University214-1 Dae-dong, Gyeongsan-si, Gyeongsangbuk-do 712-749Republic of [email protected] http://www.hanpark.net http://english-webometrics.yu.ac.kr http://asia-triplehelix.org

Thanks to my colleagues and students at the WWI.

Virtual Knowledge Studio (VKS)

•Invited speech, Department of Media & Communication, City University of Hong Kong, 29 March 2010 •(Topic: Mapping social, political, and scientific landscape using webometric method)

Page 2: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Outline of presentation

1. development of webometrics tools to automate social Internet research process (e.g., data collection and analysis from search engines, SNS and microblogging sites)

2. experimentation with new types of data visualization across period and platform (e.g, dynamic mappings using HNA)

Page 3: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Webometrics in terms of e-research

A minor but growing approach to the study of Internet-mediated communication

A new methodological perspective based on the use of new digital tools available online for conducting humanities and social science Internet research

Page 4: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Research tradition of Webometrics

• 1) development of online tools to automate the Internet research process, such as data collection and analysis

• 2) experimentation with new types of data visualization, such as social network and hyperlink analysis and multimedia and dynamic mappings

Page 5: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

http://participatorysociety.org/wiki/index.php?title=Online_Research

Page 6: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Web Scrapers, Crawlers, Tools in WCU

Page 7: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Overview• Collecting data from search engines:

Naver.com, Google.com

• Digging Social Networking Services: Cyworld Minihompies, Facebook, Plurk

• Microblogging sites: Twitter, TwtKr.com

• Korean Internet Network Miner: A Korean version of Dr. A. Gruzd’s ICTA

• Web archiving of Korean MPs: http://www.web-archive.kr/

Page 8: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

• In various degrees of development• Return data from web in a suitable form to

import into Excel, SPSS, LexiURL, etc• Returned data will contain all values, only

some of these may be relevant for the current query however having all of the data will ensure that you can revisit later if another project requires more variables

• All programs have time-rests, though these vary depending on the service being accessed.

Page 9: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

9

The purpose of this paper is to introduce the API-based webometrics tool created for the Korean search engine Naver

This non-commercial software is designed to collect large amounts of data automatically and can easily distinguish between different types of information on the web, which was impossible before.

(Image Source: Newsweek, 5 Nov 2007)

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

Webonaver (Webometrics Tool for Naver)Webonaver (Webometrics Tool for Naver)

Page 10: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

10

Rationale for the Naver

• “Republic of Naver” (Kim & Sohn, 2007)

• “Korea’s Naver is now the world’s 5th search service provider, behind Google, Yahoo, Baidu and Microsoft.” (The AP, 9 Oct 2007)

• “Google left behind as Koreans Naver-gate the internet” (Financial Times, 2 Jan 2008)

• “IN SOUTH KOREA People who want to looksomething up on the internet don’t “Google it”. Instead they “ask Naver”. (Economist, 30 Feb 2009)

• Yeon-Ok Lee and Park. H. W., (2008). "The Importance of Search Engines in Digital News Consumption A Comparative Study Between South Korea and the UK". refereed paper presented at the Workshop “Gatekeepers in a Digital Asian-European Media Landscape: The rising structural power of Internet search engines”(2008).

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

Page 11: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

11

Component of Naver

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

Log-in

The articles title (changing automatically)

The press linkedToday’s issues

Quick menubrowser window

Page 12: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Naver search options

Page 13: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

13

Interface

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS

The interface is fairly self-explanatory:

-Tick or untick to collect either only hit number or the title, URL, and description of the results

- Select which of the search options you want to include

- Click on the '...' button to select the text file that contains the queries you wish to run

- Click 'Run Queries'

The interface is fairly self-explanatory:

-Tick or untick to collect either only hit number or the title, URL, and description of the results

- Select which of the search options you want to include

- Click on the '...' button to select the text file that contains the queries you wish to run

- Click 'Run Queries'

Page 14: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

http://english-webometrics.yu.ac.kr/WebometricsTools/WeboNaver/WeboNaver.html

Page 15: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02
Page 16: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

U-I-G TH Trend Analysis

Search Area : Title, Content

Page 17: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02
Page 18: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

• web presence of the term H1N1 is examined using Webonaver. We tested the usability and reliability of this tool.

Queries: 신종플루 (A virus subtype H1N1) 신종 인플루엔자 (Influenza A virus subtype H1N1) 신종인플루엔자 (Influenza A virus subtype H1N1)

• Users can get same results from certain words containing space character and the one without space using WeboNaver.

• But, it can not assume similar words as same. Users should consider which specific data they want to extract before using this tool.

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICSS WITH E-RESEARCH TOOLS Web presence of the term H1N1

18

Page 19: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

19

Page 20: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Monitoring a Socio-political Blogosphere in South Korea:

Comparing a Metrics from Blogosphere with Voter

Turnout

Page 21: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

• Data– Blog postings related to 29 candidates for the 2009

Korean National Assembly by-election.

• Data gathering– Korean-language based blog search engine by

Naver.com – Real-time blog monitoring program by WWI– Search queries: the name of candidate + “candidate”– Search date: After Oct. 8, 2009– Data collection periods: Oct. 16 – Oct. 27, 2009 (12

days)– Cycle: Twice per a day (AM 00:00, PM 12:00)

Page 22: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Trend Analysis• Jangan district in Suwon City, Gyeonggi Jangan district in Suwon City, Gyeonggi

ProvinceProvince(Park, CS)(Lee, CY)

(Ahn, DS)(Yoon, JY)

Page 23: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Blogs vs. Votes• Jangan district in Suwon City, Gyeonggi Jangan district in Suwon City, Gyeonggi

ProvinceProvinceN. of Votes

N. of Blogs

(Park, CS)(Lee, CY) (Ahn, DS) (Yoon, JY)

(Park, CS) (Lee, CY) (Ahn, DS)(Yoon, JY)

Page 24: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Constituency Candidate Blog % Rank Vote % Rank

Jangan,

Suwon,

Gyeonggi

Park, CS( 박찬숙 ) 213.4 35.6 2 33,106 42.7 2

Lee, CY( 이찬열 ) 216.6 36.1 1 38,187 49.2 1

Ahn, DS( 안동섭 ) 158.4 26.4 3 5,570 7.2 3

Yoon, JY( 윤준영 ) 11.8 2.0 4 716 0.9 4

Sangrok-B,

Ansan,

Gyeonggi

Song, JS( 송진섭 ) 147.8 17.0 3 11,420 33.2 2

Kim, YH( 김영환 ) 280.1 32.3 1 14,176 41.2 1

Jang, KW( 장경우 ) 64.0 7.4 4 1,145 3.3 4

Kim, SK( 김석균 ) 25.7 3.0 6 896 2.6 6

Yoon, MW( 윤문원 ) 22.8 2.6 7 439 1.3 7

Lee, YH( 이영호 ) 59.5 6.9 5 987 2.9 5

Lim, JI( 임종인 ) 268.6 30.9 2 5,363 15.6 3

Gangreung,

Gangwon

Kwon, SD( 권성동 ) 85.6 32.9 1 29,010 50.9 1

Hong, JK( 홍재경 ) 68.0 26.1 3 2,100 3.7 4

Song, YC( 송영철 ) 72.1 27.7 2 19,867 34.8 2

Shim, KS( 심기섭 ) 34.9 13.4 4 6,054 10.6 3

North Chungcheong

(4 districts)

Kyoung, DS( 경대수 ) 140.2 25.2 2 19,427 28.4 2

Chung, BG( 정범구 ) 167.1 30.0 1 29,120 42.5 1

Chung, WH( 정원헌 ) 65.2 11.7 5 3,071 4.5 4

Park, KS( 박기수 ) 68.8 12.4 4 2,125 3.1 5

Lee, TH( 이태희 ) 33.2 6.0 6 504 0.7 6

Kim, KH( 김경회 ) 81.7 14.7 3 14,218 20.8 3

Yangsan,

South Gyungsang

Park, HT( 박희태 ) 258.2 30.4 1 16,597 37.9 1

Song, IB( 송인배 ) 214.2 25.2 2 15,577 35.6 2

Park, SH( 박승흡 ) 134.0 15.8 3 1,550 3.5 5

Kim, SG( 김상걸 ) 33.4 3.9 6 900 2.1 6

Kim, YS( 김양수 ) 88.7 10.5 4 5,875 13.4 3

Kim, YK( 김용구 ) 26.6 3.1 8 234 0.5 8

Kim, JM( 김진명 ) 29.3 3.5 7 325 0.7 7

Yoo, JM( 유재명 ) 64.3 7.6 5 2,710 6.2 4

Page 25: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Results• Correlation Analysis (N. of Blogs & N. of

Votes)– Pearson r = .586, p < .01 (N=29)– Spearman rho = .797, p < .01 (N=29)

• Simple Regression Analysis– N. of Votes = 1,055.56 + 79.99(N. of Blogs)– R2 = .344 (F = 14.128, p < .01)– ß = .586 (t = 3.759, p < .01)

Page 26: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Summary• Overall, the number of blogs by candidates has a

tendency to increase over time.

• By districts, the candidate who has the largest blog postings won the election.

• The results of correlation analyses (Pearson and Spearman) significantly indicate the positive relationship between blog postings and votes.

• From the results of a simple regression analysis, the number of blogs by candidates can be regarded as a significant determinant of the number of votes.

Page 27: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Cyworld• Collects profile information from the public

messages posted to initial seed user

• Takes approximately 10 seconds per user request

• Stores user details so subsequent calls are not needed

• As a result of the high numbers of comments on some Cyworld pages, the process of collecting the data can take several days

Page 28: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Cyworld Extractor - OverviewJava-based software tool that, given the URL of a politician on Cyworld, extracts comments given by citizens along with related profile attributes.

The stored data, which can amount to thousands of records, is stored in a suitable format for import into statistical software

Page 29: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

①②③

The status of mini-homepy①How active ②How famous ③How friendly

Gender

Name

Geun-Hye Park’s mini-hompy

Visitor count

Page 30: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02
Page 31: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

• After South Korean government concluded negotiation of American beef import in April, there are many conflicts between government and public opinion during the May, June, 2008.

• As graph indicates, compared to before, the biggest number of comments was recorded on all assembly members’ Minihompies in May and June, 2008.

• Among of them, specially, the biggest number of comments is recorded on mini-hompy of Kyung-TaeJo and Kyeong-Won Na.

Page 32: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

South Koreans fearing 'mad cow disease' fight US beef imports in May and June 2008

Page 33: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02
Page 34: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02
Page 35: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

IP address

Cyworld-IP screen captureSeong-Min Yoo’s mini-hompy

Page 36: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Cyworld Extractor – Data

One example of possible uses for the collected data is to determine the region of posters commenting from Korea

Page 37: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Cyworld Extractor - Data

The country of origin of those users commenting from outside Korea is also possible

Page 38: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

WCUWEBOMETRICSINSTITUTEINVESTIGATING INTERNET-BASED POLITICS WITH E-RESEARCH TOOLS

Case 2. Cyworld Mini-hompies of Korean Legislators

Cyworld Mini-hompies of Korean legislators: Co-inlink network map using Yahoo.com

However, buddy data is not publicly available!!

Page 39: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Facebook• Searches for groups with links to petition

sites

• Stores group membership numbers

• Queries petition site and stores number of signatures

• Takes approximately 10 seconds per request

• No interface

Page 40: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Facebook

Page 41: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Plurk• Gathers friends and fans list from an initial

seed user

• Returns two text files: one containing friends and one containing fans

• No interface at present and all commands must be entered through a command prompt

• Takes approximately 5 seconds per request

Page 42: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Plurk

Page 43: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Research examples on Plurk

Karma: the system will give a user a score.Karma indicates the active degree of the user (e.g messaging, comment, use of system's emoticons etc)when we point our mouse to the Karma score, the user's Karma trend is shown.

Page 44: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Google• Collects a maximum of 1,000 top search

listings• Writes the listing URL out to a text file• Interface allows setting certain parameters;

such as file type, language, and country. • More can be added to the current list of

options• Takes approximately 3 seconds per page

of results (1 page = 100 results)

Page 45: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Twitter• Collects follower/following and Tweets

from a chosen user

• Has a 150 hit rate-limit imposed by Twitter

• When rate limit reached, program will pause and show an indefinite progress dialog until the rate limit renews

• User can log in using their Twitter credentials and these will optionally be stored for a future session

Page 46: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Twitter Extractor - Overview

Sharing a similar interface and extraction mechanism with the Cyworld extractor, this application requires the URL of a user on Twitter. It is then possible to collect all tweets and determine the attributes of the user’s follower / following network

Page 47: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Twitter Extractor - Data

A simple use for this data would be to visualize a user’s network and ascertain which users are reciprocal in their friendships

Page 48: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

* A type of tweets

-A case Study on twitter of 18th National Assembly Members

* Audiences of tweets * Topic of tweets

Page 49: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Twtkr.com Scraper

Page 50: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Twitter.com VS Twtkr.com

• Korean twitter messages are not well indexed in Twitter.com

• Twtkr.com is customized for retrieving Korean twitter messages

• Scrapper was made to automate data collection procedures

• Korean tweets including ‘Sejong city’(세종시 ) have been daily harvested during March

Page 51: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Sejong City Project

• Current President Lee MB is trying to change the existing plan structured around relocating several government offices to the city (drafted by ex-Presient Roh MH)

• Proponent of original plan: Necessary for regional development

• Opponent: Partitioning of the capital would weaken Seoul’s competitiveness

Page 52: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Identifying ‘twitter-tariat’

• Twitter-tariat: A group that responds and gives meaning to social issues via Twitter (modified from N. Anstead & B. O’Loughlin’s Viewertaiat)

• Top 10 twitterians in terms of the occurrence of their tweets related to ‘Sejong’ city ( 세종시 )

• Investigating who they are, who follows them, who they follow, what they tweet; and ‘networked’ positions among peers

Page 53: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Media company

Individual

Type

Location

Je-ju

SeoulChung-nam

Oversea

Dae-jeon

Others

S.Korea

Count

Korean ‘twitter-tariat’ on Sejong city during March 2010

Page 54: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Tweet

언론매체

개인

유형

지역

제주

서울충남

해외

대전

기타

한국

Keyword : 세종시

Page 55: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

언론매체

개인

유형

지역

제주

서울충남

해외

대전

기타

한국

Follower

Keyword : 세종시

Page 56: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Following

언론매체

개인

유형

지역

제주

서울충남

해외

대전

기타

한국

Keyword : 세종시

Page 57: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Korean Internet Network Miner: A Korean version of ICTA

Page 58: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

After retrieving the blog data, it was processed to build two types of networks. • First, a chain network was extracted. In the chain network, one commentator is connected to another if the first commentator directly replied to the second commentator by clicking on the "reply-to" button.

• However, after manually examining a number of comments on several blogs, we found that there are some comments that are not "reply-to" comments, but are addressing or referencing a previous poster.

To capture missing connections, we decided to rely on another network discovery method called the Name network.

Section 1. Development of the Korean Internet Network Miner

This observation is in-line with a previous empirical study on online Learning communities by Gruzd(2009a), which discovered that the

chain network missmisses on average 40%40% of possible connections.

Page 59: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Name Network>

Another good example of challenges associated with the name/nickname disambiguation problem in comments is the word "2mb". This is because "2mb” has at least three different meanings.

First, this word can be used as a nickname for one of the blog commentators. Second, it could refer to the capacity of a computer memory (2 megabytes).Finally, it could be the alias of the current Korean president, Lee Myung-Bak.  

To address these challenges and develop recommendations for the next generation of the name network discovery algorithm, we conducted a semi-automated analysis of all names/nicknames discovered from a sample dataset using our initial algorithm.

Section 1. Development of the Korean Internet Network Miner

Page 60: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

The evaluation procedure involved clicking on each word found by the name network algorithm and exploring the context where each instance of the word was used(see Figure 3). The purpose of this semi-automated analysis was to discover what name/nickname candidates were identified incorrectly and why.

<Figure 3> A list of messages containing "2MB”

This semi-automated analysis revealed a set of additional syntactic and semantic clues that can be used to improve the accuracy of the name Network discovery algorithm.

Section 2. Evaluation of the Name Network Discovery Algorithm

Page 61: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

The second set includes clues suggesting that a word is NOT likely to be used as a nickname:  

Section 2. Evaluation of the Name Network Discovery Algorithm

● a word candidate is a phrase—for example, if the nickname input (the "FROM"field) is Used more like a subject line(possible indicators include white spaces and length);  ● a word candidate consists of a single character(e.g., "a" or " ㄱ ");

● a word candidate consists of netspeak, including emoticons(e.g. "=_="), slang and abbreviations(e.g., using "2MB" to refer to the current Korean president), and onomatopoeia (e.g. " ㅉㅉ " = tsk tsk, ” ㅋㅋ " = heehee, " 하하 " = haha, " 음 " = hmm);

● a word candidate appears more than one time in the comment;

● a word candidate consists of random characters(e.g. " ㅁㄴㅇㄹ " or "asdf");

● a word candidate is a short, conversational word or phrase(e.g., " 나나 " = me, " 아이고 " = oh no, " 그래서 " = so/therefore);

● a word candidate is a common word or idea in the given context/topic(e.g., " 대한민국 " = Republic of Korea, " 쥐체사상 " = a newly created word used to refer to political fanatics).

Page 62: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

http://www.openamplify.com/

Page 63: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

http://www.openamplify.com/

Page 64: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

http://www.openamplify.com/

1,000 free requests per day

Page 65: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Chosun VS OhMyNews

• The influential print-media establishment is composed of the "big three" conservative dailies, the Chosun, Jong Ang and Dong-A Ilbos, that lead the nation in circulation.

• OhMyNews: A new type of participatory journalism with its thousands of ordinary citizens as contributors.

Page 66: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

OhMyNews vs.Chosun: Emotionality comparison (Jul 2009 - Feb 2010)

wat

erF

ran

ce EU

Ind

epen

den

tA

fric

aK

abu

lg

asC

olo

mb

iaV

enez

uel

aP

akis

tan

pre

ssH

oll

ywo

od

par

liam

ent

Am

eric

anIt

aly

po

lice

Hu

ng

ary

Go

og

levo

ter

Eu

rop

eR

uss

iaC

op

enh

agen

elec

tio

nO

bam

aH

aiti

Ind

iaC

hin

aC

om

mu

nis

t P

arty

Afg

han

ista

nP

resi

de

nt

Bar

ack

Ob

ama

Can

ada

Ko

rea

Tal

iban

war

min

gP

ola

nd

Jap

anA

ust

rali

ab

anU

.S.

clim

ate

ch

ang

eo

pp

osi

tio

nH

1N1

Au

tho

rity

Bel

giu

mD

alai

Sw

eden

Pal

esti

nia

np

and

emic

wo

man

Isra

el oil

UN

Co

nse

rvat

ive

Asi

aIn

tern

etA

fgh

anjo

urn

alis

tec

on

om

yB

razi

lA

maz

on

No

rth

Ko

rea

Jeru

sale

mB

erlu

sco

ni

AS

EA

NU

gan

da

Bru

sse

ls

OhMyNews

-1.00

-0.80

-0.60

-0.40

-0.20

0.00

0.20

0.40

0.60

0.80

1.00

OhMyNews

Chosun

Page 67: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

• Using the sentiment analysis, we are trying to find differences and similarities in emotional polarity of main topics covered in news stories by OhMyNews versus Chosun.

• "MEAN POLARITY" - represents polarity on the scale from -1 (negative) to 1 (positive) for 78 popular topics covered in the both newspapers.

• For example, topic "Uganda" tend to be mentioned in the positive context by OhMyNews, but in the negative context by Chosun. Or topic "opposition" tend to be neutral in OhMyNews, but positive in Chosun, and so on

Page 68: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

• Web archiving of Korean MPs: http://www.web-archive.kr/

Page 69: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02
Page 70: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Experimentation with new types of data visualization across period and platform (e.g, dynamic mappings using HNA)

Page 71: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Data Collection for Web 1.0• Official homepages of South Korean Assembly

members• Manual collection: Observation• Inter-linkage: Who links to whom matrix• Explicit links excluding links in board• 2-Year tracking of same Assembly members: 2000-

2001

Sociology of Hyperlink Networks of Web 1.0, Web 2.0, and Twitter

Page 72: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Web 1.0

2000

2001

‣59 isolated in 2000‣more centralised in 2001‣network of 2001 a ‘star’ network➭- might affected by political events

presidential election in 2001➭

Page 73: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

• Data collection for Web 2.0

• Personal blogs of South Korean Assembly members

• Manual collection: Observation

• Blogroll links: Excluding links in postings

• Inter-linkage: Who links to whom matrix

• 2-Year tracking of same Assembly members: 2005-2006

• Phone interview about usage behaviours

Page 74: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Web 2.0

2005 2006

‣hubs disappearing‣easy use of blogs ‣Clear boundaries between different parties‣strong presence of GNP Assembly members

party policy on using blogs➭

Page 75: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Twitter

‣more connection between different parties‣the ruling party pays less attention on alternative media

Page 76: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Web Type

YearSum of links

(Mean)

Density

Centralisation Gini

CoefficientIn Out

Web 1.0

(N=245)

2000373

(1.52)0.006 1.84 69.33 0.984

2001515

(2.10)0.009 1.19 99.55 0.996

Web 2.0

(N=99)

2005652

(6.59)0.067 22.07 41.66 0.759

2006589

(5.95)0.061 20.67 35.10 0.763

Twitter

(N=22)

2009111

(5.05)0.240 24.72 39.68 0.408

Page 77: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

‣ Network analysis- Web 1.0 (homepage) :

loose, few important hubs & becoming a start network

- Web 2.0 (blog): denser, clear boundaries between opposition groups

- Twitter: denser than blog networks

- contributed by technological development more ➭interactive/participatory

Page 78: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

‣ Findings on online activities (Web 2.0 & Twitter) reflect offline situations

- Party policies affected the use of the Web for political purposes

- Progressive/minor groups more willing to explore alternative media

Page 79: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Incoming International Hyperlink in 2009 (drawn using ManyEyes.com)

Page 80: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Incoming International Hyperlink in 2009 (drawn using Google Earth)

Page 81: Mappingsocialpoliticalandscientificlandscapeusingwebometrcs Cityunivofhongkong24march2010 100324011529 Phpapp02

Thank you for listening!Thank you for listening!

WCUWEBOMETRICSINSTITUTE

Acknowledgments. WCU Webometrics Institute acknowledges that this research is supported from the WCU project investigating internet-based politics using e-research tools granted from South Korean Government