Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the...

30
Classification with imperfect training labels Richard J. Samworth University of Cambridge 39th Conference on Applied Statistics in Ireland (CASI2019) Dundalk, Ireland 15 May 2019

Transcript of Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the...

Page 1: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Classification with imperfect training labels

Richard J. Samworth

University of Cambridge

39th Conference on Applied Statistics in Ireland (CASI2019)Dundalk, Ireland

15 May 2019

Page 2: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Collaborators

Tim Cannings Yingying Fan

Richard J. Samworth 2/26

Page 3: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Supervised classification

Richard J. Samworth 3/26

Page 4: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Classification and label noise

With perfect labels in the binary response se�ing, we observe

(X1, Y1), . . . , (Xn, Yn)iid∼ P taking values in Rd × {0, 1}.

Task: Predict the class Y of a new observation X , where (X,Y ) ∼ Pindependently of the training data.

In many modern applications, however, it may be too expensive, di�icult ortime-consuming to determine class labels perfectly:

Uncorrupted: (X1, 1), (X2, 1), (X3, 0), (X4, 0) . . . , (Xn, 0)

Corrupted: (X1, 1), (X2, 0), (X3, 0), (X4, 0), . . . , (Xn, 1)

Richard J. Samworth 4/26

Page 5: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Existing work

The topic has been well-studied in the machine learning/computer scienceliterature (Frénay and Kabán, 2014; Frénay and Verleysen, 2014).

I Lachenbruch (1966): LDA with zero intercept is consistent withρ-homogeneous noise, where each observation mislabelled independentlywith probability ρ ∈ (0, 1/2).

I Okamoto and Nobuhiro (1997) consider the k-nearest neighbour classifierwith n = 32 and small k.

‘. . . the predictive accuracy of 1−NN is strongly a�ectedby. . .class noise’.

I Ghosh et al. (2015):‘Many standard algorithms such as SVM perform poorly in the

presence of label noise’.

Other work seeks to identify mislabelled observations and flip or remove them.

Richard J. Samworth 5/26

Page 6: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Existing work

The topic has been well-studied in the machine learning/computer scienceliterature (Frénay and Kabán, 2014; Frénay and Verleysen, 2014).

I Lachenbruch (1966): LDA with zero intercept is consistent withρ-homogeneous noise, where each observation mislabelled independentlywith probability ρ ∈ (0, 1/2).

I Okamoto and Nobuhiro (1997) consider the k-nearest neighbour classifierwith n = 32 and small k.

‘. . . the predictive accuracy of 1−NN is strongly a�ectedby. . .class noise’.

I Ghosh et al. (2015):‘Many standard algorithms such as SVM perform poorly in the

presence of label noise’.

Other work seeks to identify mislabelled observations and flip or remove them.

Richard J. Samworth 5/26

Page 7: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Existing work

The topic has been well-studied in the machine learning/computer scienceliterature (Frénay and Kabán, 2014; Frénay and Verleysen, 2014).

I Lachenbruch (1966): LDA with zero intercept is consistent withρ-homogeneous noise, where each observation mislabelled independentlywith probability ρ ∈ (0, 1/2).

I Okamoto and Nobuhiro (1997) consider the k-nearest neighbour classifierwith n = 32 and small k.

‘. . . the predictive accuracy of 1−NN is strongly a�ectedby. . .class noise’.

I Ghosh et al. (2015):‘Many standard algorithms such as SVM perform poorly in the

presence of label noise’.

Other work seeks to identify mislabelled observations and flip or remove them.

Richard J. Samworth 5/26

Page 8: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Motivating example

−4 −2 0 2

−3

−2

−1

01

23

−4 −2 0 2−

3−

2−

10

12

3

Priors π0 = 0.9, π1 = 0.1. Class conditionals X|Y = 0 ∼ N2

((−1, 0)>, I2

),

X|Y = 1 ∼ N2

((1, 0)>, I2

), n = 1000.

Le�: no noise; right: ρ-homogeneous noise with ρ = 0.3.

Richard J. Samworth 6/26

Page 9: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Risks in motivating example

5 6 7 8

81

01

21

4

log(n)

Err

or

Misclassification error for predicting the true label of the test point, for the knn(black), SVM (red) and LDA (blue) classifiers.Solid lines: no label noise; dashed lines: 0.3-homogeneous label noise.

Richard J. Samworth 7/26

Page 10: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Statistical se�ing

Let (X,Y, Y ), (X1, Y1, Y1), . . . , (Xn, Yn, Yn) be i.i.d. triples taking values inX × {0, 1} × {0, 1}.

We observe (X1, Y1), . . . , (Xn, Yn) and X . The task is to predict Y .

I For x ∈ X , define the regression function

η(x) := P(Y = 1|X = x)

and its corrupted version

η(x) := P(Y = 1|X = x).

I For x ∈ X and r ∈ {0, 1}, the conditional noise probabilities are

ρr(x) := P(Y 6= Y |X = x, Y = r).

We also write PX for the marginal distribution of X .

Richard J. Samworth 8/26

Page 11: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Classifiers

A classifier C is a (measurable) function from X to {0, 1}.

The risk R(C) := P{C(X) 6= Y } is minimised by the Bayes classifier

CBayes(x) :=

{1 if η(x) ≥ 1/2

0 otherwise.

A classifier Cn, depending on the training data, is said to be consistent ifR(Cn)→ R(CBayes) as n→∞.

The corrupted risk R(C) := P{C(X) 6= Y } is minimised by the corrupted Bayesclassifier

CBayes(x) :=

{1 if η(x) ≥ 1/2

0 otherwise.

Richard J. Samworth 9/26

Page 12: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

General finite-sample result

Let S := {x ∈ X : η(x) = 1/2}, let B := {x ∈ Sc : ρ0(x) + ρ1(x) < 1} and let

A :=

{x ∈ B :

ρ1(x)− ρ0(x)

{2η(x)− 1}{1− ρ0(x)− ρ1(x)}< 1

}.

Theorem.(i) PX

(A4{x ∈ B : CBayes(x) = CBayes(x)}

)= 0.

(ii) Now suppose there exist ρ∗ < 1/2 and a∗ < 1 such thatPX({x ∈ Sc : ρ0(x) + ρ1(x)} > 2ρ∗}

)= 0, and

PX

(x ∈ B :

ρ1(x)− ρ0(x)

{2η(x)− 1}{1− ρ0(x)− ρ1(x)}> a∗

})= 0.

Then, for any classifier C ,

R(C)−R(CBayes) ≤ R(C)− R(CBayes)

(1− 2ρ∗)(1− a∗).

Richard J. Samworth 10/26

Page 13: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

General finite-sample result

Let S := {x ∈ X : η(x) = 1/2}, let B := {x ∈ Sc : ρ0(x) + ρ1(x) < 1} and let

A :=

{x ∈ B :

ρ1(x)− ρ0(x)

{2η(x)− 1}{1− ρ0(x)− ρ1(x)}< 1

}.

Theorem.(i) PX

(A4{x ∈ B : CBayes(x) = CBayes(x)}

)= 0.

(ii) Now suppose there exist ρ∗ < 1/2 and a∗ < 1 such thatPX({x ∈ Sc : ρ0(x) + ρ1(x)} > 2ρ∗}

)= 0, and

PX

(x ∈ B :

ρ1(x)− ρ0(x)

{2η(x)− 1}{1− ρ0(x)− ρ1(x)}> a∗

})= 0.

Then, for any classifier C ,

R(C)−R(CBayes) ≤ R(C)− R(CBayes)

(1− 2ρ∗)(1− a∗).

Richard J. Samworth 10/26

Page 14: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Discussion

I This result is particularly useful when the classifier C is trained using thenoisy labels, i.e. with (X1, Y1), . . . , (Xn, Yn), since then the training andtest data in R(C) have the same distribution.

I We can then find conditions under which a classifier trained with imperfectlabels will remain consistent for classifying uncorrupted test data points.

For specific classifiers and under stronger conditions, we can provide furthercontrol of the excess risk

R(C)−R(CBayes).

Richard J. Samworth 11/26

Page 15: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

The k-nearest neighbour classifier

For x ∈ Rd, let (X(1), Y(1)), . . . , (X(n), Y(n)) be the reordering of the corruptedtraining data pairs such that

‖X(1) − x‖ ≤ . . . ≤ ‖X(n) − x‖.

Define

Cknn(x) :=

{1 if 1

k

∑ki=1 1{Y(i)=1} ≥ 1/2

0 otherwise.

Corollary. Assume the conditions of part (ii) of the lemma. If k = kn →∞,but k/n→ 0, then

R(Cknn)−R(CBayes)→ 0

as n→∞.

Richard J. Samworth 12/26

Page 16: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Further assumptions

I Label noise: Assume the conditions of part (ii) of the lemma and that

ρ0(x) = g(η(x)) and ρ1(x) = g(1− η(x)),

where g : (0, 1)→ [0, 1) is twice di�erentiable. Assume thatg′(1/2) > 2g(1/2)− 1 and that g′′ is uniformly continuous.

I Distribution (Cannings et al., 2018): Among other technical conditions, assume thatPX has a density f , that η is twice continuously di�erentiable withinfx0∈S ‖η′(x0)‖ > 0, and that∫

Rd‖x‖αf(x) dx <∞.

I For β ∈ (0, 1/2), let

Kβ := {d(n− 1)βe, . . . , b(n− 1)1−βc}.

Richard J. Samworth 13/26

Page 17: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Asymptotic expansion

Theorem. Under our assumptions, we have two cases:

(i) Suppose that d ≥ 5 and α > 4dd−4 , and let νn,k := k−1 + (k/n)4/d. Then there

exist B1 = B1(d, P ) > 0, B2 = B2(d, P ) ≥ 0 such that for each β ∈ (0, 1/2),

R(Cknn)−R(CBayes) =B1

k{1−2g(1/2)+g′(1/2)}2+B2

(kn

)4/d

+ o(νn,k)

as n→∞, uniformly for k ∈ Kβ .

(ii) Suppose that either d ≤ 4, or, d ≥ 5 and α ≤ 4dd−4 . Then for each ε > 0 and

β ∈ (0, 1/2), we have

R(Cknn)−R(CBayes) =B1

k{1− 2g(1/2) + g′(1/2)}2+ o

(1

k+(kn

) αα+d−ε

).

as n→∞, uniformly for k ∈ Kβ .

Richard J. Samworth 14/26

Page 18: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Asymptotic expansion

Theorem. Under our assumptions, we have two cases:

(i) Suppose that d ≥ 5 and α > 4dd−4 , and let νn,k := k−1 + (k/n)4/d. Then there

exist B1 = B1(d, P ) > 0, B2 = B2(d, P ) ≥ 0 such that for each β ∈ (0, 1/2),

R(Cknn)−R(CBayes) =B1

k{1−2g(1/2)+g′(1/2)}2+B2

(kn

)4/d

+ o(νn,k)

as n→∞, uniformly for k ∈ Kβ .

(ii) Suppose that either d ≤ 4, or, d ≥ 5 and α ≤ 4dd−4 . Then for each ε > 0 and

β ∈ (0, 1/2), we have

R(Cknn)−R(CBayes) =B1

k{1− 2g(1/2) + g′(1/2)}2+ o

(1

k+(kn

) αα+d−ε

).

as n→∞, uniformly for k ∈ Kβ .

Richard J. Samworth 14/26

Page 19: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Relative asymptotic performance

Given k to be used by the knn classifier in the noiseless case, let

kg :=⌊{1− 2g(1/2) + g′(1/2)}−2d/(d+4)k

⌋.

This coupling reflects the ratio of the optimal choices of k in the corrupted anduncorrupted se�ings.

Corollary. Under the assumptions of part (i) of the theorem, and providedB2 > 0, we have that for any β ∈ (0, 1/2),

R(Ckgnn)−R(CBayes)

R(Cknn)−R(CBayes)→ 1

{1− 2g(1/2) + g′(1/2)}8/(d+4)

as n→∞, uniformly for k ∈ Kβ .

If g′(1/2) > 2g(1/2), then the label noise improves the asymptotic performance!

Richard J. Samworth 15/26

Page 20: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Intuition

For x ∈ Sc, we have

η(x)− 1/2 = {1− ρ1(x)}η(x) + ρ0(x){1− η(x)} − 1/2

= {η(x)− 1/2}{

1− ρ0(x)− ρ1(x) +ρ0(x)− ρ1(x)

2η(x)− 1

}.

But, writing t := η(x)− 1/2,

1− ρ0(x)− ρ1(x) +ρ0(x)− ρ1(x)

2η(x)− 1

= 1− g(1/2 + t)− g(1/2− t) +g(1/2 + t)− g(1/2− t)

2tt→0→ 1− 2g(1/2) + g′(1/2).

Richard J. Samworth 16/26

Page 21: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Estimated regret ratios

Model: X|Y = r ∼ N5(µr, I5), where µ1 = (3/2, 0, 0, 0, 0)T = −µ0, π1 = 0.5.

Labels: Let g(1/2 + t) = 0 ∨min{g0(1 + h0t), 2g0}, then set ρ0(x) = g(η(x))

and ρ1(x) = g(1− η(x)).

4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5

1.0

1.5

2.0

2.5

log(n)

Re

gre

t R

atio

g0 h0 Asymptotic RR0.1 −1 1.37

0.1 0 1.22

0.1 1 1.10

0.1 2 1

0.1 3 0.92

Richard J. Samworth 17/26

Page 22: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Support Vector Machines

LetH denote an RKHS, and let L(y, t) := max{0, 1− (2y − 1)t} denote thehinge loss function. The SVM classifier is given by

CSVM(x) :=

{1 if f(x) ≥ 0

0 otherwise,

where

f ∈ argminf∈H

{1

n

n∑i=1

L(Yi, f(Xi)) + λ‖f‖2H}.

We focus on the case whereH has the Gaussian radial basis reproducing kernelfunction K(x, x′) := exp(−σ2‖x− x′‖2), for σ > 0.

Richard J. Samworth 18/26

Page 23: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

SVM asymptotic analysis

If PX is compactly supported and λ = λn is chosen appropriately then thisSVM classifier is consistent in the uncorrupted labels case (Steinwart, 2005).

Corollary. Assume the conditions of our lemma, and suppose that PX iscompactly supported. If λ = λn → 0 but nλn

| log λn|d+1 →∞, then

R(CSVM)−R(CBayes)→ 0

as n→∞.

Richard J. Samworth 19/26

Page 24: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

SVM assumptions

1. We say that the distribution P satisfies the margin assumption withparameter γ1 ∈ [0,∞) if there exists κ1 > 0 such that

PX({x ∈ Rd : 0 < |η(x)− 1/2| ≤ t}) ≤ κ1tγ1

for all t > 0.

2. Let S+ := {x ∈ Rd : η(x) > 1/2} and S− := {x ∈ Rd : η(x) < 1/2}, andfor x ∈ Rd, let τx := infx′∈S∪S+ ‖x− x′‖+ infx′∈S∪S− ‖x− x′‖. Say Phas geometric noise exponent γ2 ∈ [0,∞) if there exists κ2 > 0, such that∫

Rd|2η(x)− 1| exp

(τ2x

t2

)dPX(x) ≤ κ2t

γ2d

for all t > 0.

Richard J. Samworth 20/26

Page 25: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Rate of convergence

With perfect labels and when PX(B(0, 1)

)= 1, the excess risk of the SVM

classifier is O(n−Γ+ε) for every ε > 0, where

Γ :=

{γ2

2γ2+1 if γ2 ≤ γ1+22γ1

2γ2(γ1+1)2γ2(γ1+2)+3γ1+4 otherwise.

(Steinwart and Scovel, 2007).

Theorem. Suppose that P has margin parameter γ1 ∈ [0,∞], geometric noiseexponent γ2 ∈ (0,∞) and PX

(B(0, 1)

)= 1. Assume the conditions of the

lemma and that ρ0(x) = g(η(x)), ρ1(x) = g(1− η(x)), where g : (0, 1)→ [0, 1)

is di�erentiable at 1/2.

Let λ = λn := n−(γ2+1)/(γ2Γ) and σ = σn := nΓ/(γ2d). Then

R(CSVM)−R(CBayes) = O(n−Γ+ε),

as n→∞, for every ε > 0.

Richard J. Samworth 21/26

Page 26: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Linear Discriminant Analysis

Suppose that Pr = Nd(µr,Σ) for r = 0, 1. Then

CBayes(x) =

{1 if log

(π1

π0

)+(x− µ0+µ1

2

)>Σ−1(µ1 − µ0) ≥ 0

0 otherwise.

Define

CLDA(x) :=

{1 if log

(π1

π0

)+(x− µ0+µ1

2

)>Σ−1(µ1 − µ0) ≥ 0

0 otherwise,

where, πr := n−1∑ni=1 1{Yi=r}, µr :=

∑ni=1Xi1{Yi=r}/

∑ni=1 1{Yi=r}, and

Σ :=1

n− 2

n∑i=1

1∑r=0

(Xi − µr)(Xi − µr)>1{Yi=r}.

Richard J. Samworth 22/26

Page 27: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

LDA asymptotic analysis

Theorem. Assume we have ρ-homogeneous noise (ρ < 1/2) and suppose thatPr = Nd(µr,Σ), for r = 0, 1. Then

limn→∞

CLDA(x) =

{1 if c0 +

(x− µ0+µ1

2

)>Σ−1(µ1 − µ0) > 0

0 if c0 +(x− µ0+µ1

2

)>Σ−1(µ1 − µ0) < 0,

where c0 can be expressed in terms of ∆2 := (µ1 − µ0)TΣ−1(µ1 − µ0), ρand π1. As a consequence,

limn→∞

R(CLDA) = π0Φ

(c0∆− ∆

2

)+ π1Φ

(−c0

∆− ∆

2

)≥ R(CBayes), (1)

with equality if π0 = π1 = 1/2. Moreover, for each ρ ∈ (0, 1/2) and π0 6= π1,there is a unique value of ∆ > 0 for which we have equality in (1).

Richard J. Samworth 23/26

Page 28: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

LDA with ρ-homogeneous noise

5 6 7 8

51

01

5

log(n)

Err

or

Here, X|{Y = r} ∼ N5(µr, I5), where µ1 = ( 32 , 0, . . . , 0)> = −µ0 ∈ R5, and

π1 = 0.9.

No label noise (black), ρ-homogeneous noise for ρ = 0.1 (red), 0.2 (blue), 0.3(green) and 0.4 (purple). The do�ed lines show our asymptotic limit.

Richard J. Samworth 24/26

Page 29: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Summary

I The knn and SVM classifiers remain consistent with label noise under mildassumptions on the noise mechanism and data distribution.

I Under stronger conditions, the rate of convergence of the excess risk forthese classifiers is preserved.

I However, the LDA classifier is typically not consistent, unless the classpriors are equal (even with homogeneous noise).

Main reference:

I Cannings, T. I., Fan, Y. and Samworth, R. J. (2018) Classification withimperfect training labels. https://arxiv.org/abs/1805.11505.

Richard J. Samworth 25/26

Page 30: Richard J. Samworth University of Cambridge · Existing work The topic has been well-studied in the machine learning/computer science literature (Frénay and Kabán, 2014; Frénay

Other referencesI Cannings, T. I., Berre�, T. B. and Samworth, R. J. (2018) Local nearest neighbour

classification with applications to semi-supervised learning.https://arxiv.org/abs/1704.00642v2.

I Frénay, B. and Kabán, A. (2014) A comprehensive introduction to label noise. Proc.Euro. Sym. Artificial Neural Networks, 667–676.

I Frénay, B. and Verleysen, M. (2014) Classification in the presence of label noise: asurvey. IEEE Trans. on NN and Learn. Sys., 25, 845–869.

I Ghosh, A., Manwani, N. and Sastry, P. S. (2015) Making risk minimization tolerantto label noise. Neurocomputing, 160, 93–107.

I Lachenbruch, P. A. (1966) Discriminant analysis when the initial samples aremisclassified. Technometrics, 8, 657–662.

I Okamoto, S. and Nobuhiro, Y. (1997) An average-case analysis of the k-nearestneighbor classifier for noisy domains. In Proc. 15th Int. Joint Conf. Artif. Intell., 1,238–243.

I Steinwart, I. (2005) Consistency of support vector machines and other regularizedkernel classifiers. IEEE Trans. Inf. Th., 51, 128–142.

I Steinwart, I. and Scovel, C. (2007) Fast rates for support vector machines usingGaussian kernels. Ann. Statist., 35, 575–607.

Richard J. Samworth 26/26