Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand...

27
Institutional interactions: Exploring the social, cognitive, and geographic relationships between institutions as demonstrated through citation networks Erjia Yan 1 School of Library and Information Science, Indiana University, Bloomington, USA , Cassidy R. Sugimoto Abstract The objective of this research is to examine the interaction of institutions, based on their citation and collaboration networks. The domain of Library and Information Science (LIS) is examined, using data from 1965-2010. A linear model is formulated to explore the factors that are associated with institutional citation behaviors, using the number of citations as the dependent variable, and the number of collaborations, physical distance, and topical distance as independent variables. It is found that the institutional citation behaviors are associated with social, topical, and geographical factors. Dynamically, the number of citations is becoming more associated with collaboration intensity and less dependent on the country boundary and/or physical distance. This research is informative for scientometricians and policy makers. Introduction For the past several decades, citation counts have played a dominant role in assessing the impact of different research aggregates, such as papers, authors, journals, institutions, and domains. Citation counting is a well established scholarly measurement to quantitatively study the creation and dissemination of knowledge. With the advent of comprehensive paper repositories, mankind’s scientific endeavors can be further observed and analyzed. In particular, through scholarly network analysis, scientists and policy makers have gained unprecedented insights into the interaction of various research aggregates. The interaction of research aggregates can be explored from different perspectives. For example, social networks (such as coauthorship networks) focus on the pattern of interactions between social actors. Cognitive networks (such as cocitation networks) focus on identifying research topics or disciplines. In knowledge networks (such as citation networks), each node is a piece of knowledge and a link denotes the knowledge flow (Newman, 2003). In these scholarly networks mentioned above, an article is usually a single evaluation unit and can be aggregated into several higher levels, for instance, the author unit (e.g. coauthorship networks, author co-citation networks, author bibliographic coupling networks, or author citation networks) and the journal unit (e.g. journal co- citation networks, journal citation networks, or journal bibliographic coupling networks). However, another higher level of research aggregate – the institution - is rarely studied. An institution is a stable and representative unit to study the production, diffusion, and consumption of knowledge. An institution is a distinct research entity which provides an 1 Correspondence to: Erjia Yan, School of Library and Information Science, Indiana University, 1320 E. 10th St., LI011, Bloomington, Indiana, 47405, USA. Email: [email protected]

Transcript of Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand...

Page 1: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

Institutional interactions: Exploring the social, cognitive, and geographic relationships between institutions as demonstrated through citation networks Erjia Yan1

School of Library and Information Science, Indiana University, Bloomington, USA , Cassidy R. Sugimoto

Abstract

The objective of this research is to examine the interaction of institutions, based on their citation and collaboration networks. The domain of Library and Information Science (LIS) is examined, using data from 1965-2010. A linear model is formulated to explore the factors that are associated with institutional citation behaviors, using the number of citations as the dependent variable, and the number of collaborations, physical distance, and topical distance as independent variables. It is found that the institutional citation behaviors are associated with social, topical, and geographical factors. Dynamically, the number of citations is becoming more associated with collaboration intensity and less dependent on the country boundary and/or physical distance. This research is informative for scientometricians and policy makers.

Introduction

For the past several decades, citation counts have played a dominant role in assessing the impact of different research aggregates, such as papers, authors, journals, institutions, and domains. Citation counting is a well established scholarly measurement to quantitatively study the creation and dissemination of knowledge. With the advent of comprehensive paper repositories, mankind’s scientific endeavors can be further observed and analyzed. In particular, through scholarly network analysis, scientists and policy makers have gained unprecedented insights into the interaction of various research aggregates.

The interaction of research aggregates can be explored from different perspectives. For example, social networks (such as coauthorship networks) focus on the pattern of interactions between social actors. Cognitive networks (such as cocitation networks) focus on identifying research topics or disciplines. In knowledge networks (such as citation networks), each node is a piece of knowledge and a link denotes the knowledge flow (Newman, 2003). In these scholarly networks mentioned above, an article is usually a single evaluation unit and can be aggregated into several higher levels, for instance, the author unit (e.g. coauthorship networks, author co-citation networks, author bibliographic coupling networks, or author citation networks) and the journal unit (e.g. journal co-citation networks, journal citation networks, or journal bibliographic coupling networks).

However, another higher level of research aggregate – the institution - is rarely studied. An institution is a stable and representative unit to study the production, diffusion, and consumption of knowledge. An institution is a distinct research entity which provides an 1 Correspondence to: Erjia Yan, School of Library and Information Science, Indiana University, 1320 E. 10th St., LI011, Bloomington, Indiana, 47405, USA. Email: [email protected]

Page 2: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

opportunity for the combination of mappings from social, geographical, and cognitive perspectives. In addition, institution citation networks act as unique tools to examine scholars’ collective citation behaviors. The present study aims to fill this gap by studying institution citation networks in library and information science (LIS). Using 45-year data on LIS publications, several dynamic institution citation networks are constructed. These networks are used to address several essential questions on the social, geographical, and cognitive aspects of institution interactions:

• What are institutions’ roles in terms of knowledge exportation and importation? • How do institutions interact with each other via citations? • What factors are associated with institutional citation behaviors among LIS

institutions?

These questions are predicated on the assumption that citations serve as a proxy for a knowledge interaction. As described by Small (1978), citations serve as “concept markers” in literature. That is, by citing a document, a scholar is acknowledging an intellectual debt to the author of the previous work. The document is merely a proxy for that knowledge. In this way, by citing, one is acknowledging knowledge transfer and interaction. Recent studies of citation motivation have found concept marking to be one of the most dominant motivations for citing (Case & Miller, 2011).

One of the main factors to be examined in this work is the concept of proximity, with a specific emphasis on geographic proximity (Boschma, 2005). In short, the relationship between interaction (as shown through citations) and proximity (as shown through geographic distance, topical distance, and collaboration intensity) will be examined. The answers to these questions are informative to scholars and policy makers concerned with areas such as research evaluation, social network analysis, institution rankings and comparisons. The study of institutional citations informs the understanding of knowledge production and dissemination, and it contributes to the social and cognitive studies of citation pattern and behavior. Scholars working in LIS may also find the article informative as it reviews the development of this field at the institution level for the past half century.

Related work

Units of analysis

The study of the diffusion of scholarly knowledge has been conducted at several levels. At the author level, Radicchi et al. (2009) used the concept of “scientific credit” to denote citations in an author citation network. The aim of their work was to differentiate the credits of different authors. Intuitively, an author who is cited by authors with higher scientific credits would receive more credits. Attempts have also been made to differentiate popular and prestigious authors in collaboration networks (Ding & Cronin, 2011; Yan & Ding, 2011). The precision of author level evaluation, however, is influenced by the quality of name disambiguation. Unfortunately, the identification of unique author names is still an open question (Börner et al., 2006).

Page 3: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

Journal citation networks are often employed to capture knowledge diffusion among research domains. Previous investigations of journal citation networks have followed two main directions. On one hand, efforts have focused on introducing fine grained indicators for scientific evaluation; on the other hand, scholars have developed various visualization techniques to map scientific fields. There is a persistent debate on the validity of using Journal Impact Factor on research evaluation. The emergence of network based indicators brought new perspectives to the discussion. The underlying assumption of these indicators (Eigenfactor, Y-Factor, and SCImago Journal Rank Indicator) is that a journal is said to be prestigious if it is cited by other prestigious journals (Bergstrom & West, 2008; Bollen et al., 2006; SCImago, 2007). The calculation differs from traditional citation counting where the source of a citation was not considered. Another thread of research focuses on scientific visualizations (Börner et al., 2003; Boyack et al., 2005; Van Eck & Waltman, 2010). Using labels from subject categories or cluster analysis, these maps provide insights into how domain knowledge is traversed and diffused among varied disciplines.

On the country level, scholars have provided a number of indicators to compare scientific achievement of countries. These studies have looked at relationships between country wealth and scholarly output (King, 2004); concepts of “international openness” in publishing venues (Zitt, Ramanana-Rahary, & Bassecoulard, 2003); international mobility of scholars (Veugelers, 2010); and have attempted to distinguish between quantity and quality of scientific output (Nejati & Jenab, 2010). Methods have included co-citation analysis (e.g., Klavans & Boyack, 2010), contribution to total output, and complex models that take into account multiple elements of productivity (e.g., Hung, Lee, & Tsai, 2009). However, although these analyses provide comparisons between countries, the majority do not analyze interactions between them.

Institutions provide an opportunity to examine interactions between countries on a more granular level. An institution as a research aggregate has two unique features: an institution embodies a group of authors and thus it acts as a proxy to study collective author citation behaviors; meanwhile, an institution is associated with a location, which can lead to spatio-temporal discoveries on knowledge production and dissemination. In regard to indicators, various citation- based indicators have been applied to evaluate the research impact of institutions. One practical outcome are university rankings. For example, the Leiden Ranking (CWTS, 2010) provides a comprehensive evaluation of universities worldwide. Indicators include number of publications (P), citations-per-publication (CPP), size-independent field-normalized average impact, a.k.a. the crown indicator (CPP/FCSm and MNCS2), and size-dependent impact indicator (P * CPP/FCSm). In addition, experts’ opinions, such as peer opinion data, have been used by U.S. News & World Report in ranking graduate schools (Morse & Flanigan, 2010). Besides the crown indicator and peer opinion data, h-index, another well known measurement, has also been applied to institution evaluations. Molinari and Molinari (2008) proposed a size-independent h-index, which makes it possible to compare universities or laboratories of disparate sizes.

In regard to scholarly networks, Börner et al. (2006) studied an institution citation network for the 500 most highly cited institutions which published papers in the

Page 4: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

Proceedings of the National Academy of Sciences. They used the metaphor of producers and consumers to denote cited institutions and citing institutions respectively. They found that the prevalence of the Internet did not affect the diffusion and consumption of knowledge among research institutions. Both cartographic and geographic maps have been used to visualize scientific productivities. For example, Carvalho and Batty (2006) used cartograms to illustrate the geographical distribution of scientific productivity of computer science in the U.S. They found that institutional productivity still follows a power law distribution when transforming from geographic to cartographic space. Leydesdorff and Persson (2010) overlaid an institution collaboration network on several geographic maps, including Google Earth, Google Maps, and Pajek. Such visualizations facilitate users’ perception of the distribution of scientific productivity. In a review article, Frenken, Hardeman, and Hoekman (2009) outlined scientometric studies that take spatial dimensions into account. They suggested a framework with five proximity dimensions (physical, cognitive, social, organizational, and institutional) for studying scientific interactions.

Proximity

The concept of proximity has been examined by many scholars, particularly in reference to learning, innovation, and knowledge diffusion. Proximity is often seen as having multiple elements: for example, Boschma’s (2005) distinction between cognitive, organizational, social, institutional, and geographical proximity. Tensions between these dimensions have been explored (Torre & Rallet, 2005) and regressions have been sought to examine the effect of proximity on various aspects of academic productivity and collaboration, with studies identifying geographic proximity as a leading impact over the other dimensions (Nagpaul, 2003).

Many of these studies, particularly those investigating innovation, have viewed proximity in a negative or constraining lens (Breschi & Lissoni, 2009). However, with the rise of online databases and the international ubiquity of the Internet, there has been a growing “belief that the concepts of place and territory are without meaning in a globalizing society and economy” (Matthiessen, Schwarz, & Find, 2002, p. 903). However, studies examining the relationship between physical distance and scientific interaction have identified many complexities. Matthiessen, Schwarz and Find (2002) found that while citation networks were independent of distance, they were still bounded by nationality. Other studies found scientific collaboration to be largely confined within physical proximity; however this proximity was challenged by elements of elitism within the scientific network (Hoekman, Frenken, & van Oort, 2009). Ponds, van Oort, and Frenken (2007) provided a further refinement by looking at collaborations between industry, government, and academe--finding that while academic-academic collaborations were not governed by proximity, cross-institutional collaborations were more likely to occur in close geographic proximity. Despite these complexities, the majority of studies to date still defend the belief that physical proximity is a leading factor in academic citation and collaboration patterns (e.g., Havermann, Heinz, & Kretschmer, 2006; Katz, 1994; Liang & Zhu, 2002). This study will explore this assumption in regards to the LIS institution network.

Page 5: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

Methods

The dataset used in this analysis was drawn from all documents from the 59 journals indexed in the 2008 version of the Journal Citation Reports (JCR) in the Information Science & Library Science category2. All document types published within these journals from January 1965 to February 2010 were downloaded for analysis3

Data were processed in several steps. The first step was to filter the dataset in order to create a local citation network between institutions. Documents without addresses were filtered out since no institution information can be extract from such records to construct institution citation networks. Documents that were not cited by other documents in the data set were then excluded, as citations served as the unit indicating a relationship between the documents. Therefore, the final dataset consisted of documents with at least one author affiliation that had been cited by another document (containing author affiliations) within the dataset. Citation counts between the documents were then calculated, using the concept of “internal citation”. That is, the number of times an article has been cited by other articles in the network, thereby reflecting local impact. The following table shows the size of data.

.

Table 1. Size of the dataset at various stages in processing Size of all publications 207,321 Size of publications that have addresses 152,948 Size of publications been cited 24,826 Number of internal citations 128,402

The second step involved identifying unique institution names from the affiliation data. Identifying unique institution names is a complicated task (van Raan, 2005), especially when using a dataset covering multiple time periods. For ease of analysis, different organizations associated with a university are combined and recognized as a single institution. For example, LIB HARVARD UNIV, LIBRARY HARVARD UNIV, HARVARD UNIV LIB, HARVARD UNIV HOSP, and HARVARD UNIV are all merged into HARVARD UNIV. For American affiliations, the format of affiliation was unified into “AFFILIATION NAME,CITY”; for international affiliations, the format of affiliation was unified into “AFFILIATION NAME,COUNTRY”. For articles with multiple authors and multiple affiliations, each unique affiliation pair was calculated. For example, given citing article A with three affiliations (inst_a, inst_b, and inst_c), and cited article B with two affiliations (inst_b and inst_d), then the following six citation links are formed: inst_a-inst_b, inst_a-inst_d, inst_b-inst_b, inst_b-inst_d, inst_c-inst_b, and inst_c-inst_d. A detailed guideline on institution name disambiguation can be found at the footnoted link4

The dataset was then divided into four subsets based on the citing papers’ year of publication. Time spans are longer for the first two periods as the first years provided

.

2 There are 61 journals categorized as Information Science & Library Science in 2008; two journals written in foreign languages were excluded: PROF INFORM and Z BIBL BIBL. Thus the total number of journals in the data set is 59. 3 See data and visualizations at http://info.slis.indiana.edu/~eyan/papers/citation/ 4 http://info.slis.indiana.edu/~eyan/papers/citation/name_disambiguation.txt

Page 6: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

insufficient data to provide comparable networks. Table 2 shows the size of institution citation networks, paper citation networks, and number of internal citations for five time periods.

Table 2. Size of institution citation networks Time Size of institution citation

networks Size of paper citation

networks No. of internal

citations 1965-1990 1,542*1,542 4,573*5,473 8,494 1991-2000 2,906*2,906 9,750*9,750 25,957 2001-2005 3,010*3,010 9,280*9,280 35,500 2006-2010 3,783*3,783 10,998*10,998 58,451 1965-2010 6,411*6,411 24,826*24,826 128,402

The purpose of the current study is to quantitatively measure the knowledge flow among LIS institutions and to identify the patterns of knowledge flow. To achieve the first goal, the concepts of knowledge exporter and knowledge importer are proposed. For each institution, the flow of knowledge can be measured both by incoming knowledge (cited documents) and outgoing knowledge (citing documents). Cronin and Meho (2008) used an economic metaphor of export and import to denote the citation traffic among various scientific disciplines. Stigler (1994) used “import-export statistics” to measure the balance of trade for journals. In this study, a similar metaphor is adopted in that an institution exports its domain knowledge through citations and imports other institutions’ knowledge through references. Therefore, an institution has double roles of knowledge exporter and importer. Salient exporters play a more important role in producing and distributing knowledge, and can be distinguished through comparing their incoming and outgoing citations: salient exporters can be defined as institutions whose incoming citations outweigh outgoing citations.

To achieve the second goal, several factors, including physical distance, country boundary, institution type, collaboration intensity, topical distance, and citation time, are used to investigate the institutional citation behavior in LIS.

Latitude and longitude information of each institution was harvested via Google API. Distance between two institutions was then calculated using institutions’ latitudes and longitudes. Therefore, the null and research hypotheses are:

• H1.0: There is no relationship between physical distance and institution citation connectivity; and

• H1.1: There is a demonstrable relationship between physical distance and institution citation connectivity.

Besides geographical distances, country boundaries may be another factor that is associated with institutional citation behavior. Especially for institutions near the country borders, they may cite other domestic institutions more than foreign institutions even if foreign institutions are geographically closer. A country self-citation is denoted as a citation between two domestic institutions. Country self-citation is deemed as an effective tool to measure the inclination that institutions in one country cite other domestic institutions. Therefore, the null and research hypotheses are:

Page 7: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

• H2.0: There is no relationship between country boundary and institution citation connectivity; and

• H2.1: There is a demonstrable relationship between country boundary and institution citation connectivity.

In addition, an indicator Distance Weighted Citation Score (DWCS) is proposed to measure the physical impact of institutions. It uses the distances between citing and cited institutions, and multiplies such distances to each citation:

(1)

where is the citing institutions of i, is the number of citations institution k cites institution i, and is the physical distance between i and k (in kilometers). Suppose there are three institutions locating in the same region and receiving the same number of citations; A is only cited by itself, B is only cited by local institutions (in the same city, same state/province, etc.), and C is cited both by local and international institutions. A would have a DWCS of 0, B would have smaller DWCS, and C would have the highest DWCS as it has a farther impact.

Inspired by the work of Ponds, van Oort, and Frenken (2007) who studied the geographical distribution of collaborations among different types of organizations, the current study adopts the institution type as one independent variable and explores the citation intensity among institutions of different types, including universities, libraries, corporations, and governments. Therefore, the null and research hypotheses are:

• H3.0: There is no relationship between institution type and institution citation connectivity; and

• H3.1: There is a demonstrable relationship between institution type and institution citation connectivity.

Social relations comprise a spectrum of social interactions, such as collaboration relations, advisor-advisee relations, and conference co-participation relations. Among them, the collaboration relation is the one of the most representative social relationships in scholarship. Previous studies on proximity focused on the relationship between collaboration intensity and geographical distance (e.g. Hoekman, Frenken, & van Oort, 2009). Yet, little is known about the relationship between collaboration intensity and citation connectivity. In the present study, the number of citations is considered as the dependent variable, and number of collaborations between a pair of institutions is considered as one of the independent variables. Their relation is tested through a linear regression model. Therefore, the null and research hypotheses are:

• H4.0: There is no relationship between collaboration intensity and institution citation connectivity; and

• H4.1: There is a demonstrable relationship between collaboration intensity and institution citation connectivity.

As a norm, a research article usually cites previous work on similar topics to outline the connection of the current study with previous efforts. Topic similarity at the institutional

Page 8: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

level is not fully explored, and hence it is necessary to know whether topicality is also an evident factor that relates to the institutional citation behaviors. In order to quantify the topic similarity between institutions, the Author-Conference-Topic (ACT) Model (Tang et al., 2008) was used. The underlying idea of the ACT Model is that if two articles share more title (or abstract) words, they would have a higher probability or being within the same research topic. The similarity measurement can also be extended to institutions: if two institutions publish articles with similar title words, then they are more likely to be in the same research topic area. The number of topics was set at ten, and thus, each institution received a topic probability distribution:

. The topic similarity (distance) between two institutions can be calculated through cosine similarity (distance). The cosine distance between two institutions i and j is given by

(2)

Therefore, the null and research hypotheses are:

• H5.0: There is no relationship between topic distance and institution citation connectivity; and

• H5.1: There is a demonstrable relationship between topic distance and institution citation connectivity.

Scientific research has become an international event. With the help of the Internet and population mobility, scholars are now collaborating with each other at an unprecedented distance. Different from collaborative authorship, one does not need to know the cited authors in order to cite them. As a result, it is expected that the prevalence of online databases would break down the geographical boundaries (Börner et al., 2006; Lee et al., 2010). This belief is tested through studying the dynamics of institutional citation distance. Therefore, the last research hypothesis can be derived and proposed:

• H6.0: There is no relationship between citation time and institution citation connectivity; and

• H6.1: There is a demonstrable relationship between citation time and institution citation connectivity.

A linear model is constructed which directly test hypotheses 1, 4, 5, and 6. The dependent variable in the models is number of citations between pairs of institutions (y). The independent variables are number of collaborations (x1), distance in kilometers (x2), and topical distance (x3) between pairs of institutions. The box Tidwell method was first used to optimize independent variables’ exponents. A generalized linear model (GLM) is then applied to the data.

Since number of citations is discrete, the citation y can be better fitted into a Poisson distribution in the GLM framework.

(3)

Page 9: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

where

(4)

The log likelihood is

(5)

with

(6)

where the canonical link in formula (6). is the number of “incidents” in a period, and in this context it is the size of outgoing citations of citing institution i in a citation pair. By adding , the sizes of institutions are considered and controlled. The parameter can be solved through maximum likelihood estimation. Newton-Raphson (Iterative reweighted least squares) gives

(7)

where , and . For the four data sets in this study, the stabilized parameter can be attained after around 5 iterations.

Results

Knowledge exporters in LIS

The number of incoming citations was calculated for institutions in each of the four time periods. Utilizing diachronic analysis, three types of institutions can be identified. The mainstays of LIS: institutions that were highly cited (top 20) in three of the four time periods (Table 3); declining institutions: institutions that were highly cited (top 20) in the first time period, but lost rankings over time (Table 4); and rising institutions: institutions that ranked in the top 20 in the most recent time period, but did not rank in the top 20 in previous time periods (Table5).

Table 3. Mainstay institutions in LIS Institution name

1965-1990 1991-2000 2001-2005 2006-2010 Citation Rank Citation Rank Citation Rank Citation Rank

HUNGARIAN ACAD SCI,HUNGARY 118 8 432 4 515 4 950 2 UNIV GEORGIA,ATHENS 27 56 207 15 430 8 764 3 UNIV MINNESOTA,MINNEAPOLIS 72 22 427 5 502 5 724 4 UNIV WESTERN ONTARIO,CANADA 65 23 310 9 472 6 698 5 INDIANA UNIV,BLOOMINGTON 78 19 375 6 549 3 646 6 UNIV SHEFFIELD,ENGLAND 128 4 468 3 648 1 569 10 UNIV MICHIGAN,ANN ARBOR 92 16 332 7 356 13 558 12 DREXEL UNIV,PHILADELPHIA 124 5 158 24 286 16 550 13 ROYAL SCH LIB & INFORMAT SCI,DENMARK 29 53 266 11 593 2 498 16 LEIDEN UNIV,NETHERLANDS 26 60 323 8 386 11 494 17 UNIV ARIZONA,TUCSON 21 79 194 18 246 19 433 18

Page 10: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

UNIV ILLINOIS,URBANA 236 2 473 2 370 12 423 20 CITY UNIV LONDON,ENGLAND 109 9 305 10 399 10 395 21 UNIV N CAROLINA,CHAPEL HILL 99 12 265 12 291 15 362 25

Fourteen institutions rank in the top 20 for three of the four time slices. They are deemed as the most influential institutions within LIS. Two groups of institutions can be seen here: MIS (management information system) and LIS (particularly iSchools). Within the LIS institutions, those whose faculty members publish in scientometrics and informetrics feature prominently (e.g., HUNGARIAN ACAD SCI and LEIDEN UNIV). Nine of the 14 institutions are located in North America and the rest are located in Europe.

Table 4. Declining institutions in LIS Institution name

1965-1990 1991-2000 2001-2005 2006-2010 Citation Rank Citation Rank Citation Rank Citation Rank

INST SCI INFORMAT,PHILADELPHIA 417 1 187 19 166 37 200 47 UCL,ENGLAND 149 3 131 31 73 93 106 108 ACAD SCI USSR,USSR 122 6 52 93 35 182 6 915 UNIV CAMBRIDGE,ENGLAND 120 7 88 52 97 67 131 83 UNIV CALIF BERKELEY,BERKELEY 109 9 212 13 110 56 117 102 COLUMBIA UNIV,NEW YORK 106 11 142 28 33 189 54 185 UNIV CHICAGO,CHICAGO 98 13 114 38 55 122 53 192 BRITISH LIB,ENGLAND 98 13 60 76 29 210 12 599 UNIV ASTON,ENGLAND 94 15 156 25 98 65 97 116 BOSTON SPA,ENGLAND 79 18 51 96 19 287 14 532

Tables 4 displays institutions that rank in the top 20 in the 1965-1990 time period, but then decline in prominence. As can be seen, some British institutions were highly cited in 1965-1990 but attracted fewer citations and have lower ranks in recent time period. In addition, schools that offered some of the initial doctoral programs in LIS but closed in the 1990s and earlier appear here (e.g., UNIV CHICAGO and COLUMBIA UNIV).

Table 5. Rising institutions in LIS Institution name

1965-1990 1991-2000 2001-2005 2006-2010 Citation Rank Citation Rank Citation Rank Citation Rank

GEORGIA STATE UNIV,ATLANTA 5 296 91 49 435 7 1,324 1 FLORIDA STATE UNIV,TALLAHASSEE 21 79 61 75 241 21 639 7 UNIV BRITISH COLUMBIA,CANADA 22 75 88 52 257 17 623 8 UNIV OKLAHOMA,NORMAN 58 28 68 67 184 29 583 9 UNIV MARYLAND,COLLEGE PK 11 143 81 57 242 20 569 10 KATHOLIEKE UNIV LEUVEN,BELGIUM 0 NA 1 1,050 154 43 519 14 UNIV S FLORIDA,TAMPA 0 NA 21 207 122 52 500 15

Table 5 shows the institutions which have gained popularity in recent years, as shown through incoming citations. As demonstrated in Table 5, MIS institutions are better represented in recent years, with institutions such as GEORGIA STATE UNIV growing in prominence from a rank of 296 in the first time period to the highest ranked institution in the most recent time period.

Institutions’ interaction via citations

Page 11: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

Table 6 shows the top 5 citing and cited institutions for the highly cited institutions. Citing institutions denote institutions that cite the target institutions. Cited institutions denote institutions that were cited by the target institutions.

Table 6. Citing and cited institutions Top 5 citing institute No. % Top 5 cited institute No. %

HUNGARIAN ACAD SCI,HUNGARY (No. of citations: 2,005) HUNGARIAN ACAD SCI,HUNGARY 363 18.10% HUNGARIAN ACAD SCI,HUNGARY 363 30.97% KATHOLIEKE UNIV LEUVEN,BELGIUM 135 6.73% KATHOLIEKE UNIV LEUVEN,BELGIUM 62 5.29% LEIDEN UNIV,NETHERLANDS 48 2.39% L EOTVOS UNIV,HUNGARY 61 5.20% NATL INST SCI TEC & DEV STUDIES,INDIA 33 1.65% RES ASSOC SCI COMMUN & INF EV,DE 45 3.84% BAR ILAN UNIV,ISRAEL 31 1.55% LEIDEN UNIV,NETHERLANDS 45 3.84%

THE SCIENTIST,PHILADELPHIA (No. of citations: 1,855) THE SCIENTIST,PHILADELPHIA 839 45.23% THE SCIENTIST,PHILADELPHIA 839 94.91% ROCKEFELLER UNIV,NEW YORK 30 1.62% INST SCI INFORMAT,PHILADELPHIA 7 0.79% UNIV PENN,PHILADELPHIA 28 1.51% SUNY ALBANY,ALBANY 7 0.79% AI DUPONT INST CHILDREN HOSP,DE 22 1.19% UNIV GEORGIA,ATHENS 5 0.57% FAIRTEST,CAMBRIDGE 22 1.19% ROCKEFELLER UNIV,NEW YORK 2 0.23%

GEORGIA STATE UNIV,ATLANTA (No. of citations: 1,854) GEORGIA STATE UNIV,ATLANTA 125 6.74% GEORGIA STATE UNIV,ATLANTA 125 9.40% UNIV BRITISH COLUMBIA,CANADA 41 2.21% UNIV MINNESOTA,MINNEAPOLIS 60 4.51% CITY UNIV HONG KONG,P.R. CHINA 33 1.78% UNIV GEORGIA,ATHENS 46 3.46% UNIV ARKANSAS,FAYETTEVILLE 31 1.67% UNIV BRITISH COLUMBIA,CANADA 28 2.11% NATL UNIV SINGAPORE,SINGAPORE 29 1.56% QUEENS UNIV,CANADA 23 1.73%

UNIV SHEFFIELD,ENGLAND (No. of citations: 1,812) UNIV SHEFFIELD,ENGLAND 252 13.91% UNIV SHEFFIELD,ENGLAND 252 26.17% INDIANA UNIV,BLOOMINGTON 46 2.54% CITY UNIV LONDON,ENGLAND 54 5.61% UNIV TAMPERE,FINLAND 45 2.48% ROYAL SCH LIB & INF SCI,DENMARK 29 3.01% CITY UNIV LONDON,ENGLAND 42 2.32% UNIV ASTON,ENGLAND 29 3.01% ROYAL SCH LIB & INF SCI,DENMARK 38 2.10% UNIV CAMBRIDGE,ENGLAND 25 2.60%

UNIV MINNESOTA,MINNEAPOLIS (No. of citations: 1,725)

GEORGIA STATE UNIV,ATLANTA 60 3.48% UNIV MINNESOTA,MINNEAPOLIS 39 7.98% UNIV MINNESOTA,MINNEAPOLIS 39 2.26% UNIV TEXAS,AUSTIN 13 2.66% NATL UNIV SINGAPORE,SINGAPORE 33 1.91% UNIV MICHIGAN,ANN ARBOR 11 2.25% UNIV GEORGIA,ATHENS 29 1.68% MIT,CAMBRIDGE 11 2.25% UNIV ARKANSAS,FAYETTEVILLE 25 1.45% HARVARD UNIV,CAMBRIDGE 9 1.84%

INDIANA UNIV,BLOOMINGTON (No. of citations: 1,646) INDIANA UNIV,BLOOMINGTON 160 9.72% INDIANA UNIV,BLOOMINGTON 160 9.29% WOLVERHAMPTON UNIV,ENGLAND 55 3.34% UNIV ILLINOIS,URBANA 48 2.79% UNIV WESTERN ONTARIO,CANADA 38 2.31% UNIV SHEFFIELD,ENGLAND 46 2.67% UNIV CALIF LOS ANGELES,LOS ANGELES 32 1.94% RUTGERS STATE UNIV,NEW BRUNSWICK 33 1.92% UNIV ILLINOIS,URBANA 31 1.88% UNIV N CAROLINA,CHAPEL HILL 31 1.80%

Page 12: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

UNIV WESTERN ONTARIO,CANADA (No. of citations: 1,545) UNIV WESTERN ONTARIO,CANADA 164 10.61% UNIV WESTERN ONTARIO,CANADA 164 12.07% UNIV WISCONSIN,MILWAUKEE 37 2.39% WOLVERHAMPTON UNIV,ENGLAND 39 2.87% UNIV WASHINGTON,SEATTLE 33 2.14% INDIANA UNIV,BLOOMINGTON 38 2.80% INDIANA UNIV,BLOOMINGTON 25 1.62% ROYAL SCH LIB & INF SCI,DENMARK 36 2.65% UNIV ALBERTA,CANADA 24 1.55% UNIV SHEFFIELD,ENGLAND 30 2.21%

UNIV ILLINOIS,URBANA (No. of citations: 1,501) UNIV ILLINOIS,URBANA 70 4.66% UNIV ILLINOIS,URBANA 70 5.99% INDIANA UNIV,BLOOMINGTON 48 3.20% INDIANA UNIV,BLOOMINGTON 31 2.65% UNIV WESTERN ONTARIO,CANADA 30 2.00% UNIV ILLINOIS,CHICAGO 21 1.80% UNIV ARIZONA,TUCSON 25 1.67% UNIV SHEFFIELD,ENGLAND 18 1.54% UNIV N CAROLINA,CHAPEL HILL 24 1.60% UNIV CALIF LOS ANGELES,LOS ANGELES 18 1.54%

UNIV GEORGIA,ATHENS (No. of citations: 1,428) GEORGIA STATE UNIV,ATLANTA 46 3.22% UNIV GEORGIA,ATHENS 33 6.68% UNIV GEORGIA,ATHENS 33 2.31% UNIV MINNESOTA,MINNEAPOLIS 29 5.87% UNIV BRITISH COLUMBIA,CANADA 31 2.17% GEORGIA STATE UNIV,ATLANTA 18 3.64% INDIANA UNIV,BLOOMINGTON 28 1.96% INDIANA UNIV,BLOOMINGTON 15 3.04% DREXEL UNIV,PHILADELPHIA 25 1.75% FLORIDA STATE UNIV,TALLAHASSEE 12 2.43%

ROYAL SCH LIB & INFORMAT SCI,DENMARK (No. of citations: 1,384) ROYAL SCH LIB & INF SCI,DENMARK 201 14.52% ROYAL SCH LIB & INF SCI,DENMARK 201 22.26% WOLVERHAMPTON UNIV,ENGLAND 137 9.90% UNIV SHEFFIELD,ENGLAND 38 4.21% UNIV WESTERN ONTARIO,CANADA 36 2.60% DANMARKS BIBLIOTEKSSKOLE,DK 26 2.88% UNIV GRANADA,SPAIN 32 2.31% INDIANA UNIV,BLOOMINGTON 26 2.88% UNIV SHEFFIELD,ENGLAND 29 2.10% LEIDEN UNIV,NETHERLANDS 25 2.77%

Several patterns can be detected from Table 6. It is evident that citing institutions and cited institutions are quite dissimilar for the 10 target institutions. Although later analysis reveals that the clustering of LIS institutions is attainable, the presence of a compact school of thoughts or “invisible colleges”, however, is less visible. Ideally, institutions belonging to certain school of thoughts are expected to cite each other intensively--the institutions thus have low conductance (Leskovec et al., 2008) meaning that they are densely connected within but loosely connected to other institutions.

Institutional self-citation rates for top LIS institutions are around 10%. MIS institutions, such as GEORGIA STATE UNIV and UNIV MINNESOTA, tend to have lower self-citation rates while informetric institutions, such as HUNGARIAN ACAD SCI and ROYAL SCH LIB & INFORMAT SCI, tend to have higher self-citation rates. MIS institutions are more spread out where most business schools have MIS programs, and the quantity therefore facilitates inter-institutional citations; on the contrary, informetric institutions are more centralized where only a handful of universities have such specialization, and consequently the presence of “citation circles” may be inevitable (Hendrix, 2009). The distribution of institutions thus affects institutional self-citation rates.

Page 13: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

The relationship between physical distance and number of citations

For each citation pair (citing institution - cited institution), the number of citations and physical distance between the two institutions are calculated and measured. These two variables are plotted on a log-log kernel density plot. Density plots are preferred since for each of the plots there are more than ten thousand instances, and dots would be distributed all over the visualizations for normal scatter plots, thus reducing the readability. In kernel density plots, colors are indications of distribution densities: the brighter the color, the more dots are distributed in those areas.

Figure 1. Density plots: Number of citations vs. distance

R square is calculated through Spearman’s ranking correlation coefficient. It suggests that citation and distance are negatively associated; meanwhile, the correlations are quite weak: distance only contributed to around 10% of the variances of citations. Dynamically, it can be found that the citation distances are becoming longer: the kernel has gradually moved from 103 to 104 kilometers.

To explore how citations are distributed on different distance intervals, two variables, number of citations and distance between institutions, are mapped on a log-log plot. For

Page 14: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

every 200 kilometers (125 miles) increment, the number of citations for institutions within the range is accumulated, thus generating Figure 2.

Figure 2. Plot of distance and number of citations

In this log-log plot, the highest number of citations occurs when distance equals to zero, suggesting that these citations are institutional self-citations. As mentioned above, institutional self-citations comprise around 10% of total citations. The curve begins to drop a little and then arrives at a plateau (approx. 200km to 1500km), indicating that most citations happen at this distance range. By checking the citing and cited institutions, it can be noted that the majority of these citations are attributed to institutions located between (1) USA east-USA east; (2) Europe-Europe; (3) USA west-USA west; and (4) USA (east and west)-Canada (east and west). The curve then declines and plunges from approximately 1500km to 4000 km. Citations in this region are mainly attributed to (1) USA east-USA middle and (2) USA west-USA middle. The plunge ranges from 4000km to 6000km which is the width of the Atlantic Ocean. Any citation between America and Europe inevitably needs to jump this physical divide. The second plateau occurred at 6000km and ranged to 10000km. Citations at this range display inter-continent character, mainly including (1) USA (east, west, and middle)-Europe; (2) Europe-Asia (east); and (3) North America-South America. Above 10000km, citations are becoming less common, and can be explained by Australia (also including New Zealand and some Southeast Asia countries such as Singapore)-Europe and the USA.

Page 15: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

In general, the distribution curve is declining. Notice that the plot is in logarithm scale; the decline is in fact quite steep, thus suggesting that scholars in LIS are more inclined to cite other scholars who are physically closer to them. Topic distribution also contributes to this result. Two major topics of LIS studies: MIS and informetrics are conducted mainly at the USA and Europe respectively, and citations therefore also follow this pattern. Dynamically, the distribution curves for the four time slices resemble to each other, indicating that the advent of the Internet did not dramatically change the way scholars citing each other. Further scrutiny reveals that the curves are becoming flatter in recent time periods. Distance plays a less important role in recent years but it is still a notable factor that associated with institutional citation behavior.

DWCS is an indicator proposed to measure the physical impact of institutions. Table 7 shows the top institutions that have the highest physical impact for the 1965-2010 network in different regions.

Table 7. Top institutions based on DWCS

Institute name DWCS Citation

Incoming Per citation Outgoing Per citation Incoming Outgoing USA and Canada East GEORGIA STATE UNIV,ATLANTA 8.04E+06 4,335 3.65E+06 2,745 1,854 1,330 INDIANA UNIV,BLOOMINGTON 5.98E+06 3,631 4.70E+06 2,727 1,646 1,723 UNIV WESTERN ONTARIO,CANADA 5.93E+06 3,838 4.34E+06 3,197 1,545 1,359 UNIV GEORGIA,ATHENS 5.22E+06 3,658 9.95E+05 2,015 1,428 494 UNIV MICHIGAN,ANN ARBOR 5.06E+06 3,788 1.55E+06 2,433 1,336 638 Europe UNIV SHEFFIELD,ENGLAND 7.35E+06 4,055 2.70E+06 2,799 1,812 963 HUNGARIAN ACAD SCI,HUNGARY 6.42E+06 3,203 2.88E+06 2,458 2,005 1,172 ROYAL SCH LIB & INFORMAT SCI,DENMARK 4.83E+06 3,486 2.62E+06 2,905 1,384 903 LEIDEN UNIV,NETHERLANDS 4.12E+06 3,371 1.47E+06 2,336 1,223 630 CITY UNIV LONDON,ENGLAND 3.55E+06 2,945 2.71E+06 2,927 1,207 926 USA Middle UNIV MINNESOTA,MINNEAPOLIS 6.19E+06 3,589 1.09E+06 2,232 1,725 489 UNIV ILLINOIS,URBANA 4.61E+06 3,071 3.14E+06 2,691 1,501 1,168 UNIV OKLAHOMA,NORMAN 3.69E+06 4,131 1.91E+06 3,130 893 609 USA and Canada West UNIV BRITISH COLUMBIA,CANADA 4.85E+06 4,898 2.77E+06 3,752 990 737 UNIV CALIF LOS ANGELES,LOS ANGELES 3.76E+06 4,492 3.61E+06 4,537 838 796 UNIV ARIZONA,TUCSON 3.68E+06 4,115 2.62E+06 3,470 893 754 Oceania and Southeast Asia NATL UNIV SINGAPORE,SINGAPORE 5.56E+06 10,768 9.02E+06 12,647 516 713 VICTORIA UNIV WELLINGTON,NEW ZEALAND 4.84E+06 14,712 3.23E+06 13,199 329 245 UNIV AUCKLAND,NEW ZEALAND 4.16E+06 13,209 1.57E+06 12,337 315 127

Page 16: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

In order to make a more informative comparison, the top institutions in the same regions are listed. DWCS is a size dependent indicator. It measures accumulatively how far an institution influences other institutions. Per citation DWCS, on the other hand, is a size independent indicator. It measures the average impact reach. GEORGIA STATE UNIV has the highest DWCS and highest per citation DWCS in the USA and Canada East region, indicating that it is cited by both domestic and international institutions and has a far-reaching impact on other LIS institutions. In Europe, UNIV SHEFFIELD has the highest DWCS. The per citation DWCS of UNIV SHEFFIELD is above 4000km, suggesting that the majority of its citations are coming from places outside Europe. As for institutions in USA Middle area, UNIV ILLINOIS has a per citation DWCS around 3000km, suggesting that its influence is mainly domestic. It can also be noticed that institutions at Oceania and Southeast Asia tend to have higher per citation DWCS as they are further away from the places where major research are conducted, i.e. North America and Europe. DWCS and per citation DWCS therefore are useful indicators to measure the extent of citation impact.

The relationship between country boundary and number of citations

In addition to the separation caused by physical distances, country boundary is another barrier that may hold back inter-country citations. National scientific regulations, mutual trust in domestic scholars, familiarity with a local research environment, language, etc. can all be the factors that encourage institutions to cite other domestic institutions. Country self-citation is considered as an effective measurement to evaluate the extent of the boundary attachment.

Table 8. Country self-citation Time No. of internal

citations No. of internal country

self-citations Ratio (%)

1965-1990 8,494 5,358 63.08% 1991-2000 25,957 14,388 55.43% 2001-2005 35,500 16,934 47.70% 2006-2010 58,451 24,414 41.77%

As can be seen in Table 8, country self-citations occupy around half of all the citations in the institution citation networks, suggesting that authors of one institution are more inclined to cite the work by other domestic authors. This result shows a strong indication that country boundary is associated with institutional citation behavior. On the other hand, it can also be found that the ratios of country self-citations are decreasing from more than 60% in 1965-1990 to 41% in 2006-2010. This result shows that inter-country citations have gained more popularity in recent time periods. It may be the case that the Internet has created more occasions to have publications by foreign authors exposed to local scholars, thus fostering the formation of common grounds and trusts between authors from different countries.

The relationship between institution type and number of citations

Institution types are identified by institutions’ names; around 10% of the institutions cannot be classified as their names did not contain any key words that indicate the institution type.

Page 17: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

Table 9. Citations between institutions of different types 1965-1990

UNIV LIB CORP GOV Citing* UNIV 5512 399 298 144 6353 LIB 253 87 34 9 383 CORP 246 34 68 4 352 GOV 73 13 17 8 111 Cited** 6084 533 417 165 7199 1991-2000

UNIV LIB CORP GOV Citing UNIV 17134 646 686 334 18800 LIB 411 129 14 11 565 CORP 332 15 58 9 414 GOV 363 18 21 87 489 Cited 18240 808 779 441 20268 2001-2005

UNIV LIB CORP GOV Citing UNIV 24577 824 651 374 26426 LIB 457 104 11 9 581 CORP 258 8 16 3 285 GOV 604 22 30 99 755 Cited 25896 958 708 485 28047 2006-2010

UNIV LIB CORP GOV Citing UNIV 41244 868 868 624 43604 LIB 465 80 8 11 564 CORP 858 11 38 18 925 GOV 559 23 12 76 670 Cited 43126 982 926 729 45763

*Citing: Number of citations distributed; **Cited: Number of citations received.

UNIV-UNIV citations take more than 80% of the identified citations in Table 9. LIB, CORP, and GOV are highly dependent on UNIV: they cite UNIV more than other institution types, and at the same time they are cited by UNIV the most. In addition to the citations with UNIV, it can be noticed that LIB-LIB, CORP-CORP, and GOV-GOV have the highest number of citations among LIB, CORP, and GOV. The results shows that UNIV-UNIV is the major form of citation in LIS and institutions of other types are heavily dependent on UNIV. In addition, UNIV-UNIV citations are also increasing in dominance: UNIV occupies an increasing percentage of overall citing and cited documents over time.

The relationship between collaboration intensity and number of citations

Page 18: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

Collaboration and citation are different communications: collaborations are social interactions focusing on the mutual knowledge sharing and citations are cognitive inheritance focusing on unilateral knowledge transfer. This section explores the association between the two forms of communications. Note that institution self-collaboration and institution self-citation are not considered in this section.

Figure 3 uses Venn's diagrams to illustrate the number of institutions in the citation network (red), the number of institutions in the collaboration network (green), and the number of overlapped institutions (blue).

Figure 3. Number of institutions in the citation networks and collaboration networks

The sizes of the citation networks and collaboration networks are similar. They gradually grew from around 1,000 in 1965-1990 to more than 3,000 in 2006-2010. The overlapping institutions occupy less than 20% of all the institutions in networks of both types. The results suggest that, for LIS institutions, there is an apparent deviation between the collaborating and cited institutions.

Table 10 calculates the number of citations by coauthored institutions and total citations in that time period.

Table 10. Number of citations for cited and collaborated institutions

Time No. of internal

citations No. of internal citations

by coauthored institutions

Ratio (%)

1965-1990 8,494 424 4.99% 1991-2000 25,957 1,732 6.67% 2001-2005 35,500 3,795 10.69% 2006-2010 58,451 5,979 10.23%

Around 10% of the citation traffic is contributed to the institutions that have collaboration relationships with others; the majority of citations are formed between institutions without collaboration relations. Citations have a farther reach as one does not need to

Page 19: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

know the cited authors when citing their papers; collaborations, on the other hand, require certain communications between the coauthors, thus are affected by social and physical constraints.

For those institutions that have both citation and collaboration relations, their number of citations and number of collaboration times are mapped in Figure 4.

Figure 4. Density plots: Number of citations vs. Number of collaboration times

For these institutions which contain both collaboration and citation relations, their number of citations and number of collaboration times are correlated. Figure 4 seems to contradict the findings in Table 10, yet both results can stand; and in fact they describe the relationship between citation and collaboration from two perspectives. Although most citations are not attributed to institution that have collaboration relations, once two institutions build collaborations, their citation intensity is associated with their collaboration intensity.

The relationship between topical distance and number of citations

Figure 5 shows the density plot between number of citations and topic distance measured by cosine distance. A threshold of ten publications is used to screen out institutions with limited number of publications. Those institutions are undesirably inclined to have smaller topic distances with each other simply because there are insufficient publications to select the title words from, and thus they would skew the correlation results.

Page 20: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

Figure 5. Density plots: Number of citations vs. topic distance

Not surprisingly, except for the first time periods, the topic distance and number of citations are negatively correlated, indicating that institutions in LIS are more inclined to cite other institutions that work on similar research topics. However, by scrutinizing R square, one can find that the correlations are quite weak. In this respect, the results may seem to be bewildering since it is assumed that authors would cite other articles on similar topics. However, it must be noted that topic similarity is not the only reason that one article cites another; methodology resemblance, data sharing, theoretical framework inheritance, etc. are all motivations for citing. Consequently, a citing article’s title does not necessary to have high similarities with the titles of all the cited articles. In addition, since articles are aggregated into institutions, authors’ research specialty can differ noticeably within one institution, and therefore, the low correlation coefficients come as no surprise.

Linear regression models on institution citation connectivity

This section applies GLM to the selected data sets. The institutions covered in the data sets have at least one collaboration tie, one citation, and ten publications. Figure 6 shows the histograms of the independent variables and dependent variables where the numbers indicate the numbers of instances for each time period. For ease of presentation, the bins for the number of citations and number of collaborations are truncated at 10 times (the extreme values are not displayed).

Page 21: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

Figure 6. Histograms for independent and dependent variables

The distributions of the number of citations, number of collaborations, and distance in kilometers show evident power-law features, suggesting that most institutions cite and/or collaborate with other institutions only a few times, and the citing and cited institutions usually are a few thousand kilometers away from each other. The distributions of topic distance are more Gaussian.

Table 11 shows the Pseudo R square and likelihood ratio test for fitted models in the four time periods. Pseudo R square is used as the models are fitted into GLM instead of the standard linear model.

Table 11. Pseudo R square and likelihood ratio test for fitted models Time Pseudo R Square Likelihood ratio test

(Pr(>| |) ) 1965-1990 0.0252 <2E-16 1991-2000 0.0423 <2E-16 2001-2005 0.1759 <2E-16 2006-2010 0.2685 <2E-16

The likelihood ratio test demonstrates that the model is significant for the data sets for all time periods. The pseudo R square illustrates that the models are only responsible for less than 30% of the total variance of the data. Indeed, institutional citation is a complex phenomenon involving many explicit and implicit social and scholarly factors. Some of these factors are sporadic, inconsistent, and unmeasurable. The three independent variables included in the model are the most apparent ones, yet we need to acknowledge the existence of other factors associated with institutional citation behaviors.

Table 12 shows the parameters for the significant models.

Table 12. Parameters for fitted models

Page 22: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

Exponent (Box-Tidwell)

Estimate Std. Error z value Pr(>|z|) Signif. Codes

1965-1990 Intercept - 5.84E-01 1.59E-02 36.70 <2E-16 *** No. of collaborations 1.17 1.02E-01 7.14E-03 14.23 <2E-16 *** Distance in kilometers 1 -2.63E-05 3.86E-06 -6.81 9.59E-12 *** Topical distance 1 5.34E-02 2.47E-02 2.16 0.0305 * 1991-2000 Intercept - 1.44E+00 4.14E-03 346.74 <2E-16 *** No. of collaborations 1.36 2.09E-02 4.73E-04 44.08 <2E-16 *** Distance in kilometers 1 -3.88E-05 7.83E-07 -49.53 <2E-16 *** Topical distance 1 -3.26E-01 7.55E-03 -43.15 <2E-16 *** 2001-2005 Intercept - 1.61E+00 2.30E-03 699 <2E-16 *** No. of collaborations 1.82 1.35E-02 4.86E-05 278.20 <2E-16 *** Distance in kilometers 1 8.05E-05 2.69E-07 299.30 <2E-16 *** Topical distance 1 -8.25E-01 4.10E-03 -201.30 <2E-16 *** 2006-2010 Intercept - 1.58E+00 1.63E-03 970.26 <2E-16 *** No. of collaborations 1.55 1.48E-02 2.04E-05 728.15 <2E-16 *** Distance in kilometers 1 -1.06E-05 2.27E-07 -46.71 <2E-16 *** Topical distance 1 -2.43E-01 1.33E-03 -182.72 <2E-16 *** Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The maximum-likelihood estimates of the transformed parameters are computed by the Box-Tidwell method. Only the number of collaborations passed the significant test, and thus the transformed powers are used. For the rest of the independent variables, the original powers (one) are used as they did not pass the significant test. As can be seen, all three independent variables in the linear model are significantly associated with the number of citations. Surprisingly, distance is positively correlated with number of citations in 2001-2005. By checking the data, it is found that there is a highly influential outlier (WOLVERHAMPTON UNIV,ENGLAND-VICTORIA UNIV WELLINGTON,NEW ZEALAND) whose number of citations is 67 (ranked the second in that period) and distance is staggeringly 18,745 kilometers. If this outliner is removed, then the distance would be negatively correlated with number of citations. Another influential outlier is UNIV CHICAGO,CHICAGO-UNIV CAMBRIDGE,ENGLAND, which has high number of citations but long topical distance, and it also altered the sign of the estimated parameter. Number of collaborations has the most significant association with number of citations, and the association (z value) is becoming more significant in recent periods.

Discussion

Exporter and importer dynamics

Page 23: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

Through the dynamics of citation counts, several types of institutions are identified. The mainstay institutions resemble the knowledge engine in that they generate knowledge and export it to other institutions. The mainstay institutions include some typical LIS institutions which have iSchools, such as INDIANA UNIV, UNIV MICHIGAN, DREXEL UNIV, UNIV ILLINOIS, and UNIV N CAROLINA. These institutions have a broader research focus in LIS. Meanwhile, institutions with a strong research focus on information systems also have dominant representatives, such as UNIV GEORGIA, UNIV MINNESOTA, and UNIV ARIZONA. Different from traditional LIS institutions where the iSchools are responsible for the majority publications on LIS, MIS departments affiliated with business schools produce the majority publications on information system studies.

The declining institutions were core knowledge generators in LIS but became less productive and attracted fewer citations from other institutions. With the termination of library science programs in some institutions, institutions such as UNIV CALIF BERKELEY, COLUMBIA UNIV, and UNIV CHICAGO have faded away from the attention of LIS scholars. The drop of English institutions is another apparent pattern in Table 4. It may suggest a shifting LIS research center from England to the USA. The finding can further be confirmed by the citation percentile of all British institutions received in the four time slices: in 1965-1990 the British institutions received more than 19% of the total citations in LIS, but the percentile decreased diachronically, reaching only 9% in 2006-2010.

Factors associated with the institutional citation behaviors

The current study delves into the institutional citation behaviors from social, cognitive, and geographical perspectives. Physical distance, collaboration intensity, topic distance, institution type, and country boundary are investigated, and the former three independent variables are further analyzed via GLM.

UNIV-UNIV is the major form of citation in LIS; meanwhile, LIB, CORP, and GOV cite UNIV the most followed by institutions of their own types. Institution type is associated with institutional citation behavior; however, due to the dominant role of UNIV, such association is less evident for institutions of other types. Country self-citations were the major form of citations before 2000, and began to decrease and dropped to around 40% in the late 2000s. Correlation analysis shows that physical distance, collaboration intensity, and topic distance are significantly correlated with number of citations; therefore, the first, second, fourth, and fifth research hypotheses can be supported. However, the strengths of the associations differ: collaboration intensity has the highest correlation coefficient, followed by physical distance and topic distance.

The association between collaboration intensity and number of citations presents different outcomes. On one hand, only a small portion of institutions maintain both collaboration relations and citation relations, suggesting that institutions’ social activities (in terms of collaborations) are not fully mapped with institutions’ cognitive activities (in terms of citations). In choosing references, authors are less concerned with the social connection with cited authors and more considered with topical and other factors. On the other hand, once two institutions construct collaboration relations (or citation relations), they have a

Page 24: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

higher probability of citing and collaborating each other than they would cite or collaborate with institutions without such relationships. This pattern is becoming more significant in recent time periods.

In the plot of distance and number of citations, the distribution curve is declining steeply, suggesting that scholars in LIS are more inclined to cite other scholars who are physically proximal. However, this factor is becoming less distinct in recent time periods, and may be due to the increased visibility of articles made possible by the ubiquity of the Internet. Such databases may be partially responsible for changes in the way scholars cite each other, thus confirming the sixth hypothesis.

The association between topic distance and number of citations are quite weak, as one article may cite others for reasons other than topical similarity. Furthermore, since articles are aggregated into institutions, the collective topics of an institution can differ noticeably from one another. Compared with lower level research aggregates such as articles and authors, topic similarity is less evident for higher research aggregates such as institutions.

Conclusion

Using 45-year data on LIS publications, several institution citation networks are constructed. Institution citation networks are used to examine institutions’ roles in terms of knowledge exporter and importer as well as to probe into the institutions’ interaction via citations. A linear model is formulated in that the number of citations is the dependent variable, and the number of collaboration, physical distance, and topical distance are the independent variables.

Institutional citation behavior is associated with social, topical, and geographical factors. It may imply that scholars in one institution are more inclined to cite others (1) who have collaboration relationships; (2) whose research topics are more similar; and (3) who are collocated within the same country and/or physically close to them. Dynamically, the advent of online databases may have contributed to a change in the way scholars cite each other: number of citations is becoming more significant with collaboration intensity and less dependent on the country boundary and/or physical distance.

The findings of this study indicate that future research on this topic would benefit from adding topics to dynamic research communities (i.e. groups of institutions) in order to determine how topics interact with communities and how communities may co-evolve with the topics they research.

Acknowledgements

The authors would like to thank Dr. Chunfeng Huang for his comments on the linear models. The authors would like to thank Dr. Ying Ding as well as the anonymous reviewers for their comments to an early draft of this paper.

Cited works

Page 25: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

Bergstrom, C. T., & West, J. D. (2008). Assessing citations with the Eigenfactor™ Metrics. Neurology, 71(23), 1850-1851.

Bollen, J., Rodriguez, M. A., & Van De Sompel, H. (2006). Journal status. Scientometrics, 69(3), 669-687.

Boschma, R.A. (2005). Proximity and innovation: A critical assessment. Regional Studies, 39(1), 61-74.

Börner, K., Chen, C., & Boyack, K. W. (2003). Visualizing knowledge domains. Annual Review of Information Science and Technology, 37(1), 179-255.

Börner, K., Penumarthy, S., Meiss, M., & Ke, W. (2006). Mapping the diffusion of scholarly knowledge among major U.S. research institutions. Scientometrics, 68(3), 415-426.

Boyack, K. W., Klavans, A. R., & Börner, K. (2005). Mapping the backbone of science. Scientometrics, 64(3), 351-374.

Breschi, S., & Lissoni, F. (2009). Mobility of skilled workers and co-invention networks: An anatomy of localized knowledge flow. Journal of Economic Geography, 9, 439-468.

Carvalho, R., & Batty, M. (2006). The geography of scientific productivity: Scaling in US computer science. Journal of Statistical Mechanics: Theory and Experiment, 10, DOI: 10.1088/1742-5468/2006/10/P10012

Case, D.O., & Miller, J.B. (2011). Do bibliometricians cite differently from other scholars? Journal of the American Society for Information Science & Technology, 62(3), 421-432.

Cronin, B., & Meho, L. I. (2008). The shifting balance of intellectual trade in information studies. Journal of the American Society for Information Science & Technology, 59(4), 551-564.

CWTS. (2010). Leiden Ranking 2010. Retrieved December 31, 2010 from http://www.socialsciences.leiden.edu/cwts/products-services/leiden-ranking-2010-cwts.html

Ding, Y., & Cronin, B. (2011). Popular and/or prestigious? Measures of scholarly esteem. Information Processing and Management, 47(1), 80-96.

Frenken, K., Hardeman, S., & Hoekman, J. (2009). Spatial scientometrics: Towards a cumulative research program. Journal of Informetrics, 3(3), 222-232.

Havermann, F., Heinz, M., & Kretschmer, H. (2006). Collaboration and distances between German immunological institutes – a trend analysis. Journal of Biomedical Discovery and Collaboration, 1(6).

Hendrix, D. (2009). Institutional self-citation rates: A three year study of universities in the United States. Scientometrics, 81(2), 321-331.

Page 26: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

Hoekman, J., Frenken, K., & van Oort, F. (2009). The geography of collaborative knowledge production in Europe. Annals of Regional Science, 43, 721-738.

Hung, W.C., Lee, L.C., & Tsai, M.H. (2009). An international comparison of relative contributions to academic productivity. Scientometrics, 81(3), 703-718.

Katz, J.S. (1994). Geographical proximity and scientific collaboration. Scientometrics, 31(1), 31-43.

King, D.A. (2004). The scientific impact of nations: What different countries get for their research spending. Nature, 430, 311-316.

Klavans, R., & Boyack, K.W. (2010). Toward an objective, reliable and accurate method for measuring research leadership. Scientometrics, 82(3), 539-553.

Lee, K., Brownstein, J. S., Mills, R.G., & Kohane, I. S. (2010) Does collocation inform the impact of collaboration? PLoS ONE, 5(12), e14279. doi:10.1371/journal.pone.0014279

Leskovec, J., Lang, K. J., Dasgupta, A., & Mahoney, M. W. (2008). Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1), 29-123.

Leydesdorff, L., & Persson, O. (2010). Mapping the geography of science: Distribution patterns and networks of relations among cities and institutes. Journal of the American Society for Information Science & Technology, 61(8), 1622-1634.

Liang, L., & Zhu, L. (2002). Major factors affecting China’s inter-regional research collaboration: Regional scientific productivity and geographical proximity. Scientometrics, 55(2), 287-316.

Matthiessen, C.W., Schwarz, A.W., & Find, S. (2002). The top-level global research system, 1997-99: Centres, networks and nodality, an analysis based on bibliometric indicators. Urban Studies, 39(5-6), 903-927.

Molinari, J.-F., & Molinari, A. (2008). A new methodology for ranking scientific institutions. Scientometrics, 75(1), 163-174.

Morse, R., & Flanigan, S. (2010). How We Calculated the 2011 Graduate Schools Rankings. Retrieved December 31, 2010 from http://www.usnews.com/articles/education/best-graduate-schools/2010/04/15/how-we-calculated-the-2011-graduate-school-rankings.html

Nagpaul, P.S. (2003). Exploring a pseudo-regression model of transnational cooperation in science. Scientometrics, 56(3), 403-416.

Nejati, A., & Jenab, S.M.H. (2010). A two-dimensional approach to evaluation the scientific production of countries (case study: the basic sciences). Scientometrics, 84(2), 357-364.

Page 27: Institutional interactions: Exploring the social, cognitive, and ...ey86/papers/institution7.pdfand topical distance as independent variables. It is found that the institutional citation

Newman, M. E. J. (2003). The structure and function of complex networks. SIAM Review, 45(2), 167-256.

Ponds, R., van Oort, F., & Frenken, K. (2007). The geographical and institutional proximity of research collaboration. Papers in Regional Science, 86(3), 423-443.

Radicchi, F., Fortunato, S., Markines, B., & Vespignani, A. (2009). Diffusion of scientific credits and the ranking of scientists. Physical Review E, 80(5), 056103.

SCImago (2007). SJR: SCImago Journal & Country Rank. Retrieved August 31, 2009 from http://www.scimagojr.com

Small, H. (1978). Cited documents as concept symbols. Social Studies of Science, 8, 327-340.

Stigler, S.M. (1994). Citation patterns in the journals of statistics and probability. Statistical Science, 9(1), 94-108.

Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008). ArnetMiner: Extraction and mining of academic social networks. In Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'2008). pp.990-998.

Torre, A., & Rallet, A. (2005). Proximity and localization. Regional Studies, 39(1), 1-13.

Van Eck, N.J. &Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523-538.

Van Raan, A.F.J. (2005). Fatal attraction: Conceptual and methodological problems in the ranking of universities by bibliometric methods. Scientometrics, 62(1), 133-143.

Veugelers, R. (2010). Towards a multipolar science world: trends and impact. Scientometrics, 82(2), 439-456.

Waltman, L., van Eck, N.J., & Noyons, E.C.M. (2010). A unified approach to mapping and clustering of bibliometric networks. Journal of Informetrics, 4(4), 629-635.

Yan, E., & Ding, Y. (2011). Discovering author impact: A PageRank perspective. Information Processing and Management, 47(1), 125-134.

Zitt, M., Ramanana-Rahary, S., & Bassecoulard, E. (2003). Correcting glasses help fair comparisons in international science landscape: Country indicators as a function of ISI database delineation. Scientometrics, 56(2), 259-282.