On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd...

86
On the Waring distribution, the Gl¨ anzel -Schubert model and their applications - a historiett Timo Koski Dept. of Math., KTH Royal Institute of Technology ˚ Abo 22nd August 2013, Seminar in Honor of G¨ oranH¨ogn¨ as August 20, 2013 ˚ Abo 22nd August 2013, Seminar in Honor of G¨ oran H¨ ogn¨ as Waring Distribution

Transcript of On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd...

Page 1: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

On the Waring distribution, the Glanzel -Schubertmodel and their applications - a historiett

Timo KoskiDept. of Math., KTH Royal Institute of

Technology

Abo 22nd August 2013, Seminar in Honor of Goran Hognas

August 20, 2013

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 2: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Background

This lecture is based onT. Koski, E. Sandstrom, & U. Sandstrom (2011): EstimatingResearch Productivity from a Zero-Truncated Distribution.Proceedings of the 13th Conference of the International Society forScientometrics and Informetrics, Vols 1 and 2, pp. 747-755.This is a piece of argument about the Swedish government officialreport

Resurser for kvalitet (2007). Slutbetankande avResursutredningen. SOU 2007:81,

where Ulf Sandstrom Indek/KTH and Erik Sandstrom contributedwith the bibliometric model. The report and its recommendationsare also known under the acronym RUT 2.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 3: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Background

The system for funding allocation to public research institutionspresented by the Swedish government in October 2008 was basedon RUT 2. Needless to say, this generated both public and privateconflict of opinion, where expressions like ’mathysteri’ werespotted. There will be more about the public conflict at the end ofthe lecture, if time permits.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 4: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Mathysteri ?

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 5: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

The Waring method

An important part of the bibliometric method invoked in RUT 2 isa statistical estimate of how many active researchers there are inthe Nordic countries, an estimate made with what has come to beknown (at least in Sweden) as the Waring method.This estimate is needed to take into account the fact that thereare different traditions in various disciplines on publishing and inpublishing in ISI-journals1.

1as listed in Thomson Reuters

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 6: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Statement by The Government Offices of Sweden

Resursutredningens forslag (= RUT 2, au,s remark) innebar att dedirekta anslagen fordelas utifran akademins egna kriterier for vadsom ar god utbildning och forskning och utifran studenternas egnainformerade val. Det resulterar i att staten varken kan eller borstyra hur resurserna fordelas mellan larosatena. Det blir darforviktigt att denna modell skots och kvalitetssakras av ettakademiskt val kvalificerat mellanliggande organ utanforRegeringskansliet.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 7: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

A report from a ’akademiskt val kvalificeratmellanliggande organ”

byJ. Froberg, M. Gunnarsson, A. Jonsson och S. Karlsson;Avdelningen for forskningspolitisk analys, Vetenskapsradet.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 8: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

A report from a ’akademiskt val kvalificeratmellanliggande organ”

The critical comments in the report from Avdelningen forforskningspolitisk analys/VR cited are heavy. This talk is arejoinder to some of those. Let us begin from the beginning.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 9: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Or how Edward Waring broke out of the academic universe

Edward Waring (1736 -1798) held theLucasian Chair of Mathematics in the University of Cambridge.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 10: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

E. Waring, Miscellanea Analytica (1762): Waring,sFormula

Expansion in inverse factorials, due to Edward Waring:

1

x − α=

∞∑

r=0

α(r)

x(r+1), x > α > 0, (1)

where α(r) is the ascending factorial

α(r) = α · (α+ 1) · . . . · (α+ r − 1) =Γ(α+ r)

Γ(α),

where we used the well known recursion formula of the Gammafunction

Γ(z + 1) = zΓ(z). (2)

For derivations of (1), see

N.E. Norlund: Vorlesungen uber Differenzenrechnung. VerlagJulius Springer, Berlin, 1924, p. 261L.M. Milne-Thompson: The Calculus of Finite Differences.MacMillan, London, 1951, p. 291.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 11: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

J.O. Irwin (1963): the Waring distribution

Let us rewrite (1) with ρ = x − α

1

ρ=

∞∑

r=0

α(r)

(ρ+ α)(r+1), (3)

i.e.,

1 =

∞∑

r=0

ρ ·α(r)

(ρ+ α)(r+1), (4)

and we have discovered a probability distribution (pr )∞

r=0 on thenon-negative integers

pr = ρ ·α(r)

(ρ+ α)(r+1), r = 0, 1, 2, . . . , (5)

J.O. Irwin: The place of mathematics in medical andbiological sciences. J. R. Statistical Society, A, vol. 126, 1963,pp.1−14.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 12: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

The Waring distribution: the recursion formula

The Waring distribution (pr )∞

r=0 is by the recursion formula forΓ(z) in (2)

pr =

ρ · α(0)

(ρ+α)(1)= ρ

Γ(α)Γ(α)

Γ(α+ρ+1)Γ(α+ρ)

= ρα+ρ

r = 0

α+(r−1)α+ρ+r

pr−1, r = 1, 2, . . . .

(6)

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 13: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

The Waring distribution: the mean

Irwin found amongst many other things the mean µ of thedistribution as

µ =α

ρ− 1if ρ > 1. (7)

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 14: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

War(ρ, α)

We say that X is a random variable that has the Waringdistribution with parameters ρ and α, if

Pr (X = k) = pk k = 0, 1, 2, . . . .

with pks given in (6). We state this as

X ∼ War(ρ, α)

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 15: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

A weak form of Power Law

pk = P (X = k) ≈ k−(1+ρ), as k → ∞. (8)

We can call ρ the tail parameter, as it controls the tail of thedistribution. The graphs above depict pk for War(3, 1) (blue) andWar(2, 1) (green) as functions of k .

W-C. Chen: On the weak form of Zipf’s law. Journal ofApplied Probability, 17, 1980, pp. 611−622.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 16: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Zipf-Lotka Law

P (X = k) = c · k−2, k = M,M + 1, . . . (9)

where c is the normalization constant. Zipf- Lotka’s Law wasempirically found as a bibliometric distribution on the number ofauthors making k contributions. The basic discovery of Lotka wasthat the publication frequencies are skew distributions.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 17: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Yule - Simon Distribution

The state probabilities of birth-and-death processes are a source ofpower law distributions. G.U. Yule, established a model (a purebirth process) to explain the observed size distribution of generawith respect to the number of species. Yule obtained a special caseof the following probability mass function due to H.A. Simon

qk = δB (δ + 1, k) , k = 1, 2, . . . , . (10)

δ > 0, B (δ + 1, k) is the Beta function, i.e., B (x , y) = Γ(x)Γ(y)Γ(x+y) . It

can be easily checked that if X ∼ War(ρ, α), then

Pr (X = k | X > 0) → ρB (ρ+ 1, k)

as α → 0.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 18: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Generating the Waring distribution

Hierarchic:

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 19: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Generating the Waring distribution

Hierarchic:

Draw p from the Beta (prior) distribution with parameters αand ρ.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 20: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Generating the Waring distribution

Hierarchic:

Draw p from the Beta (prior) distribution with parameters αand ρ.Draw a value of X from the geometric distribution withparameter p.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 21: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Generating the Waring distribution

Hierarchic:

Draw p from the Beta (prior) distribution with parameters αand ρ.Draw a value of X from the geometric distribution withparameter p.Then X ∼ War(ρ, α).

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 22: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Generating the Waring distribution

Hierarchic:

Draw p from the Beta (prior) distribution with parameters αand ρ.Draw a value of X from the geometric distribution withparameter p.Then X ∼ War(ρ, α).Thus the Waring distribution is in Bayesian statistics known asthe Beta-Geometric distribution.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 23: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Generating the Waring distribution

Hierarchic:

Draw p from the Beta (prior) distribution with parameters αand ρ.Draw a value of X from the geometric distribution withparameter p.Then X ∼ War(ρ, α).Thus the Waring distribution is in Bayesian statistics known asthe Beta-Geometric distribution.

A. Schubert & W. Glanzel: A Dynamic Look at a Class ofSkew Distributions. Scientometrics, vol. 6, no 3, 1984,pp. 149−167.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 24: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

A.G.M. McKendrick on Modeling and Repetitive Events(1926)

In the majority of the processes with which one is concernedin the study of the medical sciences, one has to deal withassemblages of individuals, be they living or be they dead,which become affected according to some characteristic. Theymay meet and exchange ideas, the meeting may result in thetransference of some infectious disease, and so forth.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 25: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

A.G.M. McKendrick on Modeling and Repetitive Events(1926)

In the majority of the processes with which one is concernedin the study of the medical sciences, one has to deal withassemblages of individuals, be they living or be they dead,which become affected according to some characteristic. Theymay meet and exchange ideas, the meeting may result in thetransference of some infectious disease, and so forth.

The life of each individual consists of a train of such incidents,one following the other. From another point of view eachmember of the human community consists of an assemblageof cells.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 26: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

A.G.M. McKendrick on Modeling and Repetitive Events(1926)

In the majority of the processes with which one is concernedin the study of the medical sciences, one has to deal withassemblages of individuals, be they living or be they dead,which become affected according to some characteristic. Theymay meet and exchange ideas, the meeting may result in thetransference of some infectious disease, and so forth.

The life of each individual consists of a train of such incidents,one following the other. From another point of view eachmember of the human community consists of an assemblageof cells.

A.G.M. McKendrick: Applications of mathematics in medicalproblems. Proceedings of Edingburgh Mathematical Society,44, 1926, 98−130.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 27: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

A.G.M. McKendrick on Modeling and Repetitive Events(1926)

These cells react and interact amongst each other, and eachindividual lives a life which may be again considered as asuccession of events, one following the other. If one thinks ofthese individuals, be they human beings or be they cells, asmoving in all sorts of dimensions, reversibly or irreversibly,continuously or discontinuously, by unit stages or per saltum,then the method of their movement becomes a study inkinetics, and can be approached by the methods ordinarilyadopted in the study of such systems.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 28: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Glanzel & Schubert (1984): Postulates for repetitiveevents

New elements (with no occurrence) may enter the system at arate proportional to the actual total number of elements inthe system.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 29: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Glanzel & Schubert (1984): Postulates for repetitiveevents

New elements (with no occurrence) may enter the system at arate proportional to the actual total number of elements inthe system.

The chance for occurrence of the event grows linearly with thenumber of events already occurred (The (linear) Mattheweffect: For whosoever hath, to him shall be given, and he

shall have more abundance√

Matthew 13:12, King Jamestranslation).

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 30: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Glanzel & Schubert (1984): Postulates for repetitiveevents

New elements (with no occurrence) may enter the system at arate proportional to the actual total number of elements inthe system.

The chance for occurrence of the event grows linearly with thenumber of events already occurred (The (linear) Mattheweffect: For whosoever hath, to him shall be given, and he

shall have more abundance√

Matthew 13:12, King Jamestranslation).

Elements have an equal chance to drop out of the systemindependently of the number of prior occurrences of the event.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 31: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

An Infinite Array of Cells

xi = the number of elements in cell nr. i , x =∑

i=0 xi

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 32: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

The Postulates

x =∞∑

i=0

xi

s = σx , σ > 0, fi = (α+βi)xi , α > 0, β ≥ 0, gi = γxi , γ ≥ 0.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 33: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

The Postulates

xi = the content of cell nr. i , x =∑

i=0 xi .NEW ELEMENTS: Rate of the external source is proportional tothe total content:

s = σx , σ > 0. (11)

THE MATTHEW EFFECT: the higher the cell index, more facile isfurther transfer

fi = (α+ βi)xi , α > 0, β ≥ 0 (12)

UNIFORM LEAKAGE: proportional to the cell content

gi = γxi , γ ≥ 0 (13)

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 34: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

The New Europe

In the setting of Glanzel and Schubert xi is the frequency ofauthors (e.g., in some field of science in a country) with ipublished papers. The postulates tell that there is a cumulativeadvantage in higher levels of productivity. The parameter σ isproportional to the total number of authors and is the rate ofexternal source emitting new authors. The leakage parameter turnsout not to influence the equilibrium state, but influences the rateof convergence (when present) to the stationary solution.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 35: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

The Kinetics

·

x0= s − f0

·

x i= fi−1 − fi − gi = (α+ β(i − 1))xi−1 − (α+ βi + γ)xi

which yields·

x= (σ − γ)x .

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 36: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

The Kinetics

·

x0= s − f0·

x i= fi−1 − fi − gi = (α+ β(i − 1))xi−1 − (α+ βi + γ)xi·

x= (σ − γ)x .

Let us set pidef= xi

x. Then

·

pi=d

dt

(xix

)= (α+ β(i − 1))pi−1 − (α+ βi + σ)pi

·

p0= σ − (α+ σ)p0

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 37: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

The Stationary Solution

·

pi=d

dt

(xix

)= (α+ β(i − 1))pi−1 − (α+ βi + σ)pi

·

p0= σ − (α+ σ)p0

The stationary solution with·

pi=·

p0= 0 is thus clearly

pi =

α+σi = 0

α+β·(i−1)α+β·i+σ

pi−1, i = 1, 2, . . . .(14)

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 38: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

The Stationary Solution

With β = 0 (no Matthew effect) we have

p0 =σ

α+ σ

pr =α

α+ σpr−1 = . . . =

α+ σ

)r σ

α+ σ,

i.e., a geometric distribution with the parameter σα+σ

. HerePr( publication ) = α

α+σ.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 39: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Change of parameters

We re-parametrize

ρ ↔ σ

β, α ↔ α

β.

Then

pr =

α+ρr = 0

α+(r−1)α+ρ+r

pr−1, r = 1, 2, . . . .(15)

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 40: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

The Waring Distribution

Thus the stationary solution of the cell system in terms of therelative frequencies of the cell contents

pr =

α+ρr = 0

α+(r−1)α+ρ+r

pr−1, r = 1, 2, . . . .

is nothing but the Waring distribution.The condition on the tail parameter ρ > 1 for existence of mean isequivalent to σ > β, which means that the rate of infusion of newauthors is higher than the rate of transfer of authors to higherpublication numbers.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 41: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

The New Europe

Consider xi , the frequency of authors (e.g., in some field of sciencein a country during a period) with i published papers. Let therelative frequency pi of authors with i papers be computed fromdata without any model of production. Furthermore, one can agreeon the fact that the mean of the publication distribution is a somekind of measure of scientific productivity (in that field andcountry). Then

µ =imax∑

i=1

xi pi

However, any reasonable estimate of productivity must involve thenotion of potential authors, i.e., those doing research but notpublishing during the period under consideration.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 42: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Estimation of the frequency of zero

The publication productivity data is by its very definitionzero-truncated, i.e., as there is no information of those that are notpublishing (in a certain period of time). We shall now find a wayto estimate frequency of zero from zero truncated (or, truncated tothe left at one) data using the Waring distribution.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 43: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

A.G.M McKendrick: example of estimation of zerofrequency

The problem of estimation of zero frequency (under a Poissonmodel) from zero truncated data was first considered byA.G.M McKendrick, recognized also for the McKendrick-VonFoerster partial differential equation. McKendrick (1926) wasconsidering a case of estimating the number of individuals in anIndian village, who were susceptible to infection but did notdevelop the symptoms. He developed a differential equation andsolved it to get the negative binomial distribution, from which heobtained the Poisson distribution as a limiting case.

A.G.M. McKendrick: Applications of mathematics in medicalproblems. Proceedings of Edingburgh Mathematical Society,44, 1926, 98−130.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 44: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

A pioneering example of estimation of zero frequency

McKendrick developed a moment estimator to find the the numberof individuals susceptible to infection but did not develop thesymptoms. His data contained the number of individuals that didnot develop the symptoms, including thus the immune ones.We shall also use a kind of moment estimator of zero frequencyusing a remarkable characterization of the Waring distribution bytruncated means.

X.L. Meng : The EM algorithm and medical studies: Ahistorical link. Statistical Methods in Medical Research, 6,1997, 23.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 45: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Left-truncation of War(ρ, α)

The proof (omitted) of the following theorem is an appplication ofthe recursion of the Waring probabilities and the Gamma function.

Theorem

If X ∼ War(ρ, α), then

Pr (X = n + i | X ≥ n) = Pr (Y = i) , i = 0, 1, 2, . . . (16)

where Y ∈ War(ρ, α + n)

By (16) we have Pr (X − n = i | X ≥ n) = Pr (Y = i) and thus

E [X − n | X ≥ n] = E [Y ]

and since Y ∈ War(ρ, α + n), we get by (7)

E [Y ] =α+ n

ρ− 1(17)

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 46: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Left-truncation of War(ρ, α)

Hence

E [X − n | X ≥ n] =α+ n

ρ− 1.

But E [X − n | X ≥ n] = E [X | X ≥ n]− n. Thus we have foundthat if X ∼ War(ρ, α), then

E [X | X ≥ n] = µ+ n · µ1, n = 0, 1, . . . (18)

where µ = αρ−1 (as it should) and µ1 =

ρρ−1 . In fact (18) is a

characterization of War(ρ, α).

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 47: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

A Characterization of War(ρ, α)

The following theorem is due to Glanzel and Schubert. A simplifiedproof is given by Dimaki and Xekalaki.

Theorem

X ∼ War(ρ, α) if and only if

E [X | X ≥ k] = µ+ k · µ1, k = 0, 1, . . . (19)

where µ(= E [X ] = E [X | X ≥ 0]) is given in (7) and µ1 =ρ

ρ−1 .

W. Glanzel, A. Telcs & A. Schubert: Characterization bytruncated moments and its application to Pearson typesystems. Zeitschrift fur Wahrscheinlichkeitstheorie undverwandte Gebiete, 66, 1984, pp. 173−183

C. Dimaki & E. Xekalaki: Towards a unification of certaincharacterizations by conditional expectations. Annals of theInstitute of Statistical Mathematics,48, 1996, pp. 157 168.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 48: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

A First Characterization of War(ρ, α)

The simplified proof of the characterization above applies anothercharacterization.

Theorem

X ∼ War(ρ, α) if and only if

P (X > r) =α+ r

ρP (X = r) , r = 0, 1, . . . (20)

Proof: ⇐: We assume that (20) is true for all r = 0, 1, . . .. Forr = 0 we have that 1− P (X = 0)= P (X > 0) = α

ρP (X = 0),

which is solved by P (X = 0) = ρ/(α+ ρ), and this equals by (15)the probability of zero for X ∼ War(ρ, α).Next

P (X > r + 1) = P (X > r)− P (X = r + 1)

and by (20)

=α+ r

ρP (X = r)− P (X = r + 1)

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 49: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

A Characterization of War(ρ, α)

In other words

P (X > r + 1) =α+ r

ρP (X = r)− P (X = r + 1)

But we assume (20) so that

P (X > r + 1) =α+ r + 1

ρP (X = r + 1)

Thus it must hold that

α+ r + 1

ρP (X = r + 1) =

α+ r

ρP (X = r)− P (X = r + 1]

⇔(α+ r + 1)P (X = r + 1) = (α+ r)P (X = r)− ρP (X = r + 1]

⇔P (X = r + 1) =

α+ r

α+ ρ+ r + 1P (X = r)

which is the recursion for the Waring probabilities in (6)Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 50: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Waring regression

With regard to the Waring model of scientific productivity we wantto estimate the parameter µ, which is the mean of the Waringdistribution. We use the the left truncated mean from (19) as anaffine function of k

E [X | X ≥ k] = µ+ k · µ1, k = 0, 1, . . .

from the above. Let yk be the left truncated sample mean i.e., anestimate of E [X | X ≥ k], kmax is the maximum value of thepublications in data. Then we write

yk = µ+ k · µ1 + ek , k = 1, . . . , kmax

where ek are random deviations (or residuals) of yk from the ’true’regression line. Then by fitting of straight line by (weighted) leastsquares, we may estimate the intercept µ and the regressioncoefficient µ1.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 51: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Waring regression

A final question is to obtain a figure of the uncertainty of theestimate of µ. There seems to be no immediate analytic procedurefor assessment of this uncertainty, as, in particular, we shouldperhaps not assume the homoscedasticity and independence of theresiduals.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 52: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

The Waring Method

Consider the left truncated publication means yk , k = 1, . . . , kmax

of some field of research or some university

yk = µ+ k · µ1 + ek , k = 1, . . . , kmax

Find the estimate µ the intercept µ (= the average number ofpublications per person) and then

number of potential authors =number of papers

µ

There is a certain vagueness in the concept of potential authors inthe literature. We incorporate now the frequency of zero.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 53: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Productivity . . .

The publication productivity data is by its very definitionzero-truncated, i.e., as there is no information of those that are notpublishing (in a certain period of time). But as is clear from (19),the Waring distribution is not hampered by the truncation. Asobserved above the expression (19) in gives a way of finding bylinear regression against k the estimated intercept µ and this canbe used to estimate the frequency of zero via r = 0 in (15) as

p0 =α+ µ

α(µ+ 1) + µ(21)

if α is the estimate from data.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 54: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Data and methods

A weakness of earlier empirical tests of the Waring based estimatesis the lack of precise test data. In order to produce a satisfactoryempirical dataset for testing the accuracy of Waring basedestimates of the zero class frequency, a known publicationfrequency distribution that includes zero values has to be created.Next, the creation of a publication frequency dataset is described,which is based on figures concerning researchers at two Swedishuniversities. The different distributions created below are based ona selection of potential authors, i.e. categories of people that weexpect could publish research papers.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 55: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Data and methods

These potential authors will not have published during the selectedtime period (but possibly in another time period) and will thusform the zero class of the publication frequency distribution. Itshould be noted that the frequency distributions will varydepending on which categories of people that are selected. Weinclude professors, researchers and senior lecturers in the potentialauthor definition.Employee data concerning the time period of 2005-2007 wereobtained from two Swedish universities.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 56: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Data and methods

A selection of 729 and 949 from the respective universities werehereby obtained. Publication data was downloaded for eachpotential author from the Web of Science and compiled into atable were the number of publications (article, letter and review)associated with each potential author was listed. In addition, thenumber of first author publications and reprint publications byeach potential author was extracted (the table omitted here).

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 57: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Data and methods

Furthermore, a random author was selected for each of thedownloaded publications. This was achieved by randomly selectinga single author from the author list of each publication, resulting ina selection of one author per publication. The number of randomlyselected authorships of each potential author was added to thetable of publication frequencies. For each university and authorshiptype (first, reprint, all and random), a publication frequencydistribution, i.e. the number of authors having one publication,two publications and so forth, was compiled (the table omittedhere). The zero-frequencies were removed to form zero-truncatedsamples.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 58: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Waring based estimation of the population mean

The publication frequency distributions described above arezero-truncated samples: the zero frequencies are missing. Theobjective of the method presented and tested is to estimate thesezero frequencies.

Extraction of left truncated sample means. The result is a setof data points ranging from one (zero-truncated) to themaximum value of the distribution, with increasing means.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 59: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Waring based estimation of the population mean

The publication frequency distributions described above arezero-truncated samples: the zero frequencies are missing. Theobjective of the method presented and tested is to estimate thesezero frequencies.

Extraction of left truncated sample means. The result is a setof data points ranging from one (zero-truncated) to themaximum value of the distribution, with increasing means.Fitting of straight line. The data points are plotted and astraight line is fitted through the points using weighted leastsquare regression. Weights in

are used.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 60: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Waring based estimation of the population mean

The publication frequency distributions described above arezero-truncated samples: the zero frequencies are missing. Theobjective of the method presented and tested is to estimate thesezero frequencies.

Extraction of left truncated sample means. The result is a setof data points ranging from one (zero-truncated) to themaximum value of the distribution, with increasing means.Fitting of straight line. The data points are plotted and astraight line is fitted through the points using weighted leastsquare regression. Weights in

A. Telcs, W. Glanzel and A. Schubert: Characterization andstatistical test using truncated expectations for a class of skewdistributions. Mathematical Social Sciences, 10, 1985,169−178.

are used.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 61: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Waring based estimation of the population mean

The publication frequency distributions described above arezero-truncated samples: the zero frequencies are missing. Theobjective of the method presented and tested is to estimate thesezero frequencies.

Extraction of left truncated sample means. The result is a setof data points ranging from one (zero-truncated) to themaximum value of the distribution, with increasing means.Fitting of straight line. The data points are plotted and astraight line is fitted through the points using weighted leastsquare regression. Weights in

A. Telcs, W. Glanzel and A. Schubert: Characterization andstatistical test using truncated expectations for a class of skewdistributions. Mathematical Social Sciences, 10, 1985,169−178.

are used.Simplified Waring estimation The estimation presented abovemay be simplified and calculated only based on the samplemean and the share of one-frequencies.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 62: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Agreement between expected and estimated productivity

The potential author data set provides a full population ofpublication frequencies, including zero frequencies. This providesfor detailed comparisons of true and estimated population means.However, the data set is rather small and the estimates cantherefore be expected to be unstable.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 63: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Waring Regression SLU-first author data

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 64: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

This is a fact !

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 65: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Avdelningen for forskningspolitisk analys

Let us assign manually and arbitrarily some additional publicationsto the most productive in the data set in the Figure above:

One of the criticisms levelled by Avdelningen for forskningspolitiskanalys against the Waring method ! Another is that the regressionfit is improved, if the extreme values in the right are removed, ashas been done by Telcs et.al., but this removal has no basis in thetheory.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 66: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Agreement between expected and estimated productivity

But when we use the estimate of zero frequency:

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 67: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Agreement between expected and estimated productivity

The estimated values are for the most part very good. In manycases the estimations are within a 5 % margin from the expectedvalues. Only in a few cases the estimates are considerably far fromthe expected values (>20 %). Compared to the Simplified Waringthe full version of Waring performs slightly better.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 68: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Testing the Reliability of the Estimate

The results above indicate that Waring based estimates of the zerofrequency in general produce good results. A concern is, however,that the estimates will be very sensitive to small variations in thesample, i.e. that the reliability of the estimations will be low. Thatis certainly the case in the estimates provided above since they arebased on relatively small samples ( 500-1000 authors).

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 69: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Testing the Reliability of the Estimate

In cases where we do not need to know the zero class, largersamples are created. In the following we have created a test of thereliability of the estimations when larger samples are used.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 70: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Memories

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 71: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Error Analysis

To test the error margin on a larger sample, a second data set hasbeen compiled. From the Web of Science, publications with Nordicaddresses published between 2003 and 2006 were downloaded. Aselection of Nordic authors was obtained by extracting first authorsand reprint authors from the downloaded publications andconnecting these to the designed addresses. Authors withnon-Nordic addresses were removed. The restriction to first andreprint authors was necessary since other authors could not beassociated with specific addresses.The names of the selected set of author fractions were manuallyadjusted to distinguish between homonyms and to harmonizeauthor fractions relating to the same person. The number of firstand reprint authorships of each distinctive author represented inthe data were extracted and compiled into a table.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 72: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Error Analysis

Each publication was designated to one of seven fields based onthe classification of ISI subject categories. Following this, eachdistinctive author was designated to the field where the author hadmost publications. In cases where the number of publications wasequal for two fields, one field was randomly selected. For eachfield, a publication frequency distribution, i.e. the number ofauthors having one publication, two publications and so forth, wascompiled. The number of potential authors with zero publicationsin the selected time period is not included.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 73: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Bootstrap

A bootstrap technique for this is to resample, say B times, theauthorship data (described above) thus creating a replicateauthorship data. For each replicate data set one calculates theregression line obtaining µ1, . . . , µB , from which one can calculatethe emprical distribution of the estimate of the intercept itsbootstrap mean, bootstrap standard deviation and find fractals tocompute a bootstrap confidence interval for the intercept.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 74: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

The Bootstrap Distribution for the Intercept for Physics

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 75: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Confidence intervals of estimated productivity

Bootstrapped population mean estimates were computed for eachzero truncated distribution of the field data set and confidenceintervals were calculated using bootstrap.The confidence intervals show that the differences between thedifferent fields are in most cases small. For the social science fields,however, the confidence interval is large, which shows thatestimates of distributions having a large share of zeros make itdifficult (as expected). Still, the results indicate that Waring basedestimations of productivity can be used for field comparisons.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 76: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Confidence Intervals for nine fields of science

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 77: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

The Question from Avdelningen for forskningspolitiskanalys

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 78: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Historiett

Historiette is a diminutive form of the French word histoire, and aliterary term used in French since 1700’s. Historiette is a shortstory, like an anecdote. For readers of Swedish literature the termis almost exclusively identified with a book (a collection of shortstories) by Hjalmar Soderberg titled ”Historietter”, first publishedin 1898.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 79: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Finally: Bibliometrics

. . . a statistical approach to master the flood of scientificinformation and to analyse and to understand the characteristics ofbig science by measuring quantitative aspects of communication inscience and by providing the results to scientists and users outsidethe scientific community. Monitoring, description and modelling ofthe production, dissemination and use of knowledge was originallyin the foreground.. . . In the following two decades after 1980 bibliometrics wascharacterised by a shift towards science-policy andresearch-management application (W. Glanzel: The perspectiveshift, 2006)

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 80: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

The present historiett: A case of science policy

The system for funding allocation to public research institutionspresented by the Swedish government in October 2008 was basedon RUT 2. This was applied starting 2009. In 2011 anotherreport, an assignment from the Government, was authored by thethen Chancellor of Swedish universities

A. Flodstrom: Prestationsbaserad resurstilldelning foruniversitet och hogskolor. U2011/7356/UH, 252 pages.

who amongst other things recommended abandoning of theWaring method (being controversial).

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 81: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

The present historiett: A comment on U2011/7356/UH

SULF har stora principiella invandningar mot att anvandakvalitetsindikatorer som grund for resursfordelning . . . avvisa(r)den nuvarande prestationsbaserade modellen (=RUT 2) liksom deforandringar av denna som foreslas av utredaren (=A. Flodstrom).

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 82: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

The present historiett: A comment on U2011/7356/UH(cont’d)

SULF avvisar . . . bestamt ett ensidigt nyttjande av enkla matt paden forutvarande verksamheten som mangden externa medel, antalpublikationer eller citeringar som ett underlag for statsmakternasresursfordelning. Det framsta skalet harfor ar att sadana mattverkar systematiskt konserverande, till forman for forskning langshuvudfaran, det vill saga mer forskning om det vi redan vet.Utnyttjandet av sadana indikatorer vid fordelningen avforskningsresurser innebar en forvaxling av kvantitet och kvalitet.

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 83: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Thank You ! (Hippopotamus= ? in Swedish)

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 84: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

A.G.M. McKendrick again

. . . individuals, may meet and exchange ideas, the meetingmay result in the transference of . . .The life of each individual consists of a train of such incidents,one following the other . . .

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 85: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

A.G.M. McKendrick again

. . . individuals, may meet and exchange ideas, the meetingmay result in the transference of . . .The life of each individual consists of a train of such incidents,one following the other . . .

react and interact amongst each other, and each individuallives a life which may be again considered as a succession ofevents, one following the other. If one thinks of theseindividuals, . . ., as moving in all sorts of dimensions, reversiblyor irreversibly, continuously or discontinuously

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution

Page 86: On the Waring distribution, the Gl nzel -Schubert model ...tjtkoski/goranwaring.pdf · ˚Abo 22nd August 2013, Seminar in Honor of Go¨ran Ho¨gna¨s Waring Distribution. Background

Succession of events: Lyckliga Ar som Prof. emeritus !Tack Goran !

Abo 22nd August 2013, Seminar in Honor of Goran Hognas Waring Distribution