STUDY OF LOW MOMENTUM MUONS RECONSTRUCTION WITH … Nuclear, Subnuclear and Astroparticle physics...

Universita degli Studi di Bari

Dipartimento Interateneo di Fisica ”Michelangelo Merlin”Nuclear, Subnuclear and Astroparticle physics curriculm

MASTER DEGREE THESIS

STUDY OF LOW MOMENTUM MUONS RECONSTRUCTIONWITH CMS EXPERIMENT AT LHC

Thesis supervisorsDott.ssa Anna ColaleoDott.ssa Rosamaria Venditti

CandidateLeonarda Lorusso

Academic year 2018/2019

.....

.....

Contents

Introduction 1

1 The theoretical scenario at LHC 4

1.1 The Standard Model of the Particle Physics . . . . . . . . . . 4

1.2 QCD: Quantum Chromodinamics . . . . . . . . . . . . . . . . 8

1.3 The electroweak interaction . . . . . . . . . . . . . . . . . . . 11

1.4 The Higgs mechanism . . . . . . . . . . . . . . . . . . . . . . 13

1.5 Standard Model re-discovery at CMS . . . . . . . . . . . . . . 14

1.6 Open questions and Physics beyond SM (BSM) . . . . . . . . 17

1.6.1 Low momentum muons as a tool to probe New Physics 21

2 The CMS experiment at LHC 24

2.1 The Large Hadron Collider . . . . . . . . . . . . . . . . . . . . 24

2.1.1 Technical characteristics . . . . . . . . . . . . . . . . . 24

2.1.2 LHC between past and future . . . . . . . . . . . . . . 26

2.2 The CMS Experiment . . . . . . . . . . . . . . . . . . . . . . 28

2.2.1 The detector . . . . . . . . . . . . . . . . . . . . . . . . 29

2.2.2 The CMS muon system . . . . . . . . . . . . . . . . . . 34

2.2.3 The trigger system . . . . . . . . . . . . . . . . . . . . 41

2.3 The Run 2 data taking conditions . . . . . . . . . . . . . . . . 42

2.3.1 Future Persectives . . . . . . . . . . . . . . . . . . . . 43

3 Muon Reconstruction and Identi�cation at CMS 46

3.1 The o�ine muon reconstruction at CMS . . . . . . . . . . . . 46

3.1.1 Reconstruction of the hits and the segments . . . . . . 47

3.1.2 Track reconstruction . . . . . . . . . . . . . . . . . . . 50

3.2 Muon identi�cation at CMS . . . . . . . . . . . . . . . . . . . 54

3.3 Muon isolation . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.4 Performance of Muon Identi�cation and Isolation Algorithms . 58

i

4 Study of a low-pT muons identi�cation algorithm with MVA

techniques 61

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2 Monte Carlo simulation in CMS . . . . . . . . . . . . . . . . . 62

4.2.1 Matching generated to reconstructed particles . . . . . 64

4.3 Signal description: Ds → τ + ντ → 3µ+ ντ . . . . . . . . . 64

4.4 Study of the background composition . . . . . . . . . . . . . . 66

4.5 Performance of the standard muon ID algorithms . . . . . . . 69

4.6 Implementation of the MVA discriminator . . . . . . . . . . . 71

4.6.1 Signal and Background de�nition . . . . . . . . . . . . 74

4.6.2 Input Discriminating variables . . . . . . . . . . . . . . 76

4.6.3 Algorithms setup and con�guration . . . . . . . . . . . 79

4.6.4 Performance of the discriminators: Kolomogorov-Smirnov

test, response and ROC curve . . . . . . . . . . . . . . 82

4.7 Validation in the control region Ds −→ φπ and in the Mini-

mum Bias events . . . . . . . . . . . . . . . . . . . . . . . . . 87

Conclusion 94

Appendix A Introduction to Multivariate Analysis and statis-

tical learning 94

A.1 De�nitions and basic concepts . . . . . . . . . . . . . . . . . . 94

A.1.1 Supervised and Unsupervised learning . . . . . . . . . 95

A.1.2 Classi�cation vs Regression . . . . . . . . . . . . . . . 95

A.1.3 How do one choose a statistical learning method? . . . 96

A.1.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . 98

A.1.5 Data Preprocessing . . . . . . . . . . . . . . . . . . . . 99

A.2 Classi�cation methods . . . . . . . . . . . . . . . . . . . . . . 100

A.2.1 Linear Discriminant Analysis . . . . . . . . . . . . . . 100

A.2.2 Boosted Decision Tree . . . . . . . . . . . . . . . . . . 102

A.2.3 Multilayer Perceptron . . . . . . . . . . . . . . . . . . 106

A.2.4 BFGS . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

A.3 TMVA: Toolkit for Multivariate Analysis . . . . . . . . . . . . 112

Appendix B Performance of muons reconstruction algorithm 114

B.1 Signal and Background de�nition in training phase . . . . . . 114

B.2 Validation phase results . . . . . . . . . . . . . . . . . . . . . 117

Bibliography 122

ii

Introduction

The Standard Model (SM) is one of the greatest triumphs of the science, rep-

resenting more than a centenary of experimental discoveries and theoretical

innovations in particle physics. Its predictions have been veri�ed experimen-

tally both in the electroweak and strong sector, at the electroweak scale (102

GeV). Nevertheless, what happens beyond this scale is still unknown and

under investigation.

Various alternatives to SM involve new symmetries, forces and constituents.

The search for new physical phenomena or further con�rmation of the va-

lidity of the Standard Model is entirely dominated by the ability of each

experiment to accurately carry out di�erent measurements. The Compact

Muon Solenoid (CMS) experiment at LHC (Large Hadron Collider) has been

projected in this direction. This work is focused on the crucial role played by

the muons in the searches for new physics scenarios with CMS experiment.

Whitin CMS, muons are detected by a dedicated system, composed by detec-

tors based on di�erent gas ionization technologies, which allows to measure

muon momentum with good precision till the order of TeV1.

Very accurate and e�cient algorithms have been developed to reconstruct

and identify muons over a large momentum spectrum. The range of pT > 30

GeV is dominated by muons from W and Z decays, used as standard cadles to

check the muon reconstruction and identi�cation performance in the context

of the Higgs boson searches.

In the range of pT < 30 GeV, the most abundant source of muons is the

semileptonic decay of heavy-�avor hadrons, characterized by �nal state topolo-

gies that are similar to the ones of the muons involved in the signature of

some new physics processes, such as heavier charged lepton �avor violation

decay and supersymmetric light bosons; these muons are accompanied by

1In this thesis, Natural Units will be used: c = ~ = 1, where ~ = h/2π = 6.58211889(26) · 10−23

MeVs and c = 299792458 ms−1

1

a high rate of background muons, resulting from the decay of light-�avor

hadrons (such as π and K) and hadrons that are erroneously mis-identi�ed

as muons. The relative weight of the contribution of this background is quite

susceptible to the details of muon identi�cation algorithm. In this thesis

the capability of discriminate muon from the signal event of lepton �avor

violation, τ → 3µ, and background muons from pile-up interactions is inves-

tigated.

The CMS muon reconstruction algorithms showed an excellent performance

for medium pT (5-10 GeV), intermedium pT (10-200 GeV) and high pT (>

200 GeV) muons. The purpose of my thesis is to extend these standard

methods to the low pT region (< 3-4 GeV), for possible future applications

in new physics searches at CMS.

In this thesis a feasibility study, exploiting Machine Learning techniques,

has been carried out with the goal to discriminate signal muons from the

background ones. In particular, a multivariate discriminator has been devel-

oped, combining several input variables describing the quality of the muon

reconstruction, the energetic deposits in the calorimeters and the timing in-

formation, targeting very high e�ciency on signal muons and a few percent

e�ciency on the background.

The thesis is structured in 4 chapters.

Chapter 1 describes the theoretical framework of the Standard Model with

the basilar concepts of di�erent theories beyond the Standard Model, with

particular reference to the object of this project: low-momentum muons as

tool to investigate new physics.

Chapter 2 presents the experimental framework: the LHC and the main

components of the CMS apparatus.

Chapter 3 explains the standard muon reconstruction and identi�cation al-

gorithms used in CMS.

Chapter 4 presents in details the strategy of the analysis: in a �st step I in-

vestigated the background composition, in order to identify the background

muons; then I trained a Multilayer Perceptron algorithm which could discrim-

inate signal muons (muons from τ decay) from background ones. Finally I

validated its performance on a sample kinematically known, obtaining better

results with respect to the standard muon identi�cation algorithms.

Further details, provided in order to understand the structure and the oper-

ation of the Machine Learning algorithm used in the study, are reported in

2

Appendix A.

Finally the Appendix B collects a set of supporting results on the standard

reconstruction algorithms at CMS.

3

Chapter 1

The theoretical scenario at LHC

The physics studied at Large Hadron Collider (LHC) is the physics of the

"in�nitely small", the physics of the elementary particles. The Standard

Model (SM) of the Particle Physics is the best theoretical model so far to

describe the elementary particles and three of the four known interactions

existing among them. The SM was successfully tested from the experimental

point of view with high precision at leptons and hadrons colliders in the past

years (SppS, LEP, Tevatron) and in present days. Until the LHC era only

one piece of the theory was missing for its complete proof: the existence

of the Higgs boson, whose discovery is one of the main achievement of the

LHC and its experiments. More data have been collected to check whether

the properties of this new particle imply physics beyond the Standard Model

and if new physics exists beyond the hundrends-GeV scale.

This Chapter is dedicated to the explanation of the theoretical perspective

that drives the LHC searches and aims to fathom the limits of the SM.

1.1 The Standard Model of the Particle Physics

The Standard Model, developed by Glashow, Weinberg and Salam during

the 60's, is a renormalizable quantum �eld theory (QFT) which incorporates

quantum mechanics and relativity to describe natural phenomena at subnu-

clear scales, providing a description of the electromagnetic, strong and weak

interaction.

The basic SM paradigm is that there is a set of elementary particles consti-

tuting the matter, and its mathematical description at a fundamental level

is based on �eld concept, i.e. on wavefunctions associated with points in

4

spacetime, to which a local probability can be associated [1]. Classically,

�elds were just a mathematical abstraction: the real things were forces, due

to a �eld that propagates in the form of wave. With quantum mechanics of

the �elds, interactions were described as quantum excitations of relativistic

�elds: these excitations are particles and their interactions happen through

the exchange of other intermediary particles. So all particles are described

by relativistic �elds, but interaction �elds and matter �elds have di�erent

behaviors: matter �elds (particles) interact exchanging interaction �elds,

whose excitation is an intermediary particle. Whereas matter �elds satisfy

the Pauli exclusion principle (only one particle can occupy a given quantum

state), obey Fermi-Dirac statistics and are called fermions, there is no limit

to the number of identical and indistinguishable interaction �elds that can

occupy the same quantum state: they obey Bose-Einstein statistics and are

called bosons.

The spin of a particle and the statistics it obeys are connected by the spin-

statistics theorem, according to which fermion have half-integer spins and

bosons have integer spins. For each known particle there is an anti-particle

counterpart (antimatter �elds), with the same mass and opposite charge

quantum number.

At the present energy scale there are 12 elementary matter �elds and accord-

ing to our current knowledge they can be divided into two big families: 6

leptons and 6 quarks. Each big family in turn can be divided into 3 gen-

erations, with similar properties, but di�erent masses. This is summarized

in �gure 1.1: �rst 3 columns represent the matter �elds, while the last one

indicates the interaction �eld, called gauge bosons. The last square labels

represent the Higgs particle, whose role will be clear later.

At the current energy scale of the Universe, particles interact via 4 founda-

mental interactions. Indications that this view is related to the present-day

energy of the Universe exist: at higher energies, i.e. earlier epochs, inter-

actions would "unify"; the current theory foresees those interactions to be

remnants of one single interaction that would occur at extreme energies at

the beginning of the Universe [1]. By increasing order of strenght it is possible

distinguish:

• the gravitational interaction, acting between whatever pair of bod-

ies and dominant at macroscopic scales.

• the electromagnetic interaction, acting between pair of electrically

charged particles, or all matter �elds excluding neutrinos.

5

Figure 1.1: Presently obeserved elementary particles [1]. The quark and

lepton �avours are shown too.

• the weak interaction, acting between all matter �elds with certain

selection rules.

• the strong interaction responsible for binding the atomic nuclei; most

in detail the color force acting among quarks.

Each of these interactions, according to quantum mechanical view, manifest

itself through the exchange of intermediate particles, called quanta of the

�eld force: the quantum of electromagnetic force is the photon γ; the weak

interaction is mediated by 3 bosons, one neutral Z and two charged W±;

�nally the quanta of strong force are 8 gauge bosons, called gluons.

The coupling of each particle to the boson(s) associated to a given interac-

tion are determined by the strenght of the interaction and by a number called

"charges". The gravitational charge of a particle is proportional to its mass;

the weak charge is the weak isospin charge (± 1/2 for fermions and 0, ±1 for

6

bosons); the electrical charge is the positive or negative charge; the strong

charge is called color, distinguishable into in 3 types: red, green and blue.

The Standard Model is built on the concept of local gauge invariance

of its Lagrangian. What does it means?

The concept of invariance is streactly connected to those of symmetry: if a

system is said for example invariant with respect to some transformations in

the space, it is possible say that the laws of physics are the same everywhere

and then they are symmetric. So more generally, a simmetry is an invariance

over a transformation or a group of transformations [1].

The dynamical description of a particle system in the quantum world can be

expressed by the so-called Lagrange function L = K - V, where K is the kine-

matic energy and V the potential one. In fact the equations of motion, called

Euler-Lagrange equations, have the form: ddt

(∂L∂qi

)− ∂L

∂qi= 0, with qi coor-

dinates and L Lagrange density function1. Noether's theorem demostrates

that simmetries of this function with respect to given operations entail con-

servation laws and therefore conserved quantity, that are observed in natural

phenomena.

Therefore the Lagrangian of SM is invariant under gauge transformations in

the sense that it is a gauge simmetry, where a gauge transformation is de�ned

in this way:

Aµ −→ Aµ + δθ

with Aµ a generic �eld and θ an arbitrary function. If θ depends on the

space-time coordinates x = (ct, x, y, z), the transformation is called local,

otherwise global [2]. The invariance of the Lagrangian under global gauge

tranformation is related to global conservation of the system energy, while

the local one is related to the local conservation of the energy: this last one

is what ensures the energy conservation in the evolution of a system from

the initial state to �nal state.

In a theory of the interactions as SM is, the motion equations (that have the

form of Schroedinger, Klein-Gordon and Dirac equations) are modi�ed to

incorporate explicitly the coupling with the interaction �elds. The introduc-

tion of these new terms makes equations invariant under a combined local

gauge transformation of matter and interactions �elds. Conversely requiring

that the matter quantum equations should be invariant with respect to local

1The Lagrange function L is usually writed as the integral over the spatial coordinates

x = (x, y, z) of the Lagrangian density as L(t) =∫d3xL(x)

7

gauge transformation within some internal symmetry group implies the ex-

istence of well-de�ned interaction �elds, then called gauge �elds.

For completeness, the gauge transformation of the wave function associated

to the �eld Aµ can be written in more general form as:

ψ(x)→ eiθAψ(x) (1.1)

where A is a generic unitary operator, eiθA indicates the phase term and ψ

is the matter �eld.

In more detail, the mathematical formulation of the SM is based on the local

gauge invariance of its Lagrangian under the SU(3)C× SU(2)L× U(1)Ygroup: the invariance under SU(3) provide a way to describe the color inter-

action, the Quantum Chromodinamics (QCD); the symmetry group SU(2)

Ö U(1) describe instead the weak and electromagnetic interactions, the so-

called electoweak interaction.

Finally, saying that it is a renormalizable theory means that not divergent

terms must be present in its formulation: as it will be explained later these

in�nite are absorbed, introducing a dependency on energy of the coupling

constants (one for each interaction).

1.2 QCD: Quantum Chromodinamics

The strong interaction is modeled by QCD, Quantum Chromodinamics, a

theory exploiting the invariance of the strong interaction with respect to a

rotation of 3 di�erent element in "color space". The minimal representation

for such a symmetry is a non-Abelian gauge group SU(3)C .

In general, SU(3) denotes the group of special unitary 3-dimensional matri-

ces which represent space-time dependent rotations in complex plane, where

"special" means that the determinant of the matrices is equal to 1. Since

SU(n) group has n2 - 1 free parameters, and therefore n2 - 1 generators,

then this group has 8 generators (λa

2Gell-Mann matrices) that represent a

color exchange and obey the following commutation rules:[λa2, λb

2

]= ifabc

λc2,

where fabc are the structure constants of the group and characterize the �-

nite transformations in a suitable neighborhood of the unit transformation

[2]. This means that generators do not commute and from here the non-

Abelian characteristic of the group. This mathematical structure derives

from the conservation of the colour charge.

8

The equation 1.1 takes the form:

ψ(x) −→ eigsλa

2θaψ(x)

and the gauge �elds needed to assure the invariance of the wave equations

under these transformations are gluons.

Quarks and antiquarks appear in the fundamental 3 and 3 representation

of the gauge group SU(3)C , the so-called �avour-SU(3) group (theory of the

eightfold way), where 3 represents the colour number and 3 the anti-colour

one: this is why it talks about "color triplet" [2]. These particles constitute

the so-called hadrons, or colour singlet that can be obtained by bending in

a single particle all three colours, or all three anticolours, or a colour charge

and the equivalent anticolour charge.

The only stable hadronic states are neutral color, so free objects can't be col-

ored and it is not possible to observe free quark (�con�nement� phenomenon).

Hadrons can therefore be of two types: (anti)baryons, consisting of three

(anti)quarks of a di�erent (anti)colour and mesons, consisting of a quark of

a certain colour and an antiquark which carries the corresponding anticolour.

Gluons are in the so-called adjoint representation and then their number is

the same (8) of the number of the generators of SU(3)C group: in this case it

talks about "color octet". They are bi-colored particles because they carry

color charges and can interact each other.

The dynamic structure of the group is expressed by the Lagrange den-

sity for SU(3):

LQCD = ψ (iγµ∂µ −m)ψ − gψ(x)γµλa

2Gaµψ −

1

4GaµνGa

µν (1.2)

where the �rst addend is the Dirac free Lagrange density for a Fermion

�eld ψ of spin 1/2 at space-time position x and mass m, with γµ Dirac

matrix: it represents the free quark propagation; the last term represent the

kinetic tensorial term for gauge bosons; middle one represent the term of

the interactions between fermion �elds and gluons: Gaµ are the vectorial �eld

(gluons), with a from 1 to 8, and g represent the strength of interaction, or

better gs to indicate the strong coupling constant, often parametrized as:

αs =g2s4π

(1.3)

It is not really constant, but it runs with energy (running coupling constant).

9

The running strong coupling constant

In the quantum physics, the vacuum is not really empty! It is the state of

minimum energy and consists of pairs of virtual particles. From the elec-

tromagnetic point of view, a point-like charge polarizes the vacuum, creat-

ing electron-positron pairs which orient themselves as dipoles, screening the

charge itself. The force carriers of the strong force (gluons) have color charges

themselves and increase the number of force carrying particles: their screen-

ing e�ect is opposite to the quark one (see �gure 1.2).

Figure 1.2: Vacuum polarization with coloured particles [3].

Among two e�ects, gluons screening dominates, thus the coupling constant

becomes larger at lower energies and it becomes in�nitely strong eventually

(color con�nement). This behavior is shown in �gure 1.3, together with the

experimental measurements at di�erent energy scales.

The dependency of αs from the energy (momentum transfer) has the form:

αS(Q2)

=αS (µ2)

1 + αS(µ2)12π

(11nc − 2nf ) lg(Q2

µ2

) (1.4)

where nf indicates the �avours number and nc the colors number (equal to

3), and µ is a scale parameter that must be determined experimentally. The

decreasing is logarithmically, so the strenght of coupling remain important

even for very large energy (asymptotic freedom).

Thanks to the results from the Tevatron and from the LHC, the energy scales

at which αs is determined now extend up to more than 1 TeV [4].

10

Figure 1.3: Summary of measurements of αs as a function of the

momentum transfer Q, at several perturbative orders [4].

1.3 The electroweak interaction

The electromagnetic and weak interaction appear to be very di�erent at low

energies and are modelled using two di�erent gauge theories. However at

energy of the order of 100 GeV, they can be considered two manifestations of

a same force: in 1968 in fact Glashow, Salam and Weinberg uni�ed the two

forces, leading to the so-called ElectroWeak (EWK) theory. The experimen-

tal proof of the validity of this theory came in 1983 with the discovery of the

W± and Z0 bosons by the UA1 and UA2 collaborations in proton-antiproton

collisions [2]. From the formal point of view, the uni�cation is accomplished

under a SU(2)L × U(1)Y gauge group.

In general, SU(2) denotes the group of special unitary 2-dimensional matrices

and is the group of the spin rotations; it has 3 free parameters and 3 gener-

ators, that are the Pauli matrices, and it is connected to the conservation of

the �weak isospin� quantity, T.

If in the equation 1.1, A is chosen to be one of the generators of the SU(2)

group, then the associated gauge transformation corresponds to a local ro-

tation in a spinor space. The invariance of the wave equations under these

gauge local transformation leads to the need of 3 gauge �eld: W± and Z.

SU(2)L in particular refers to left-handed fermion �elds or equivalently right-

handed anti-fermion �elds that partecipate to weak interactions, because here

the parity (inversion of the space axes) isn't a symmetry of the group.

11

U(1) is the group of the unitary 1-dimensional matrices and has 1 free pa-

rameters and 1 generator, the "weak hypercharge" connected to the electrical

charge by the Gell-Mann-Nishijma relation: Q = Tz+Y2, where Tz is the third

component of the weak isospin. It is represented by space-time dependent

rotation in a complex plane, so that the multiplication of the state equa-

tion of a particle by a member of this group produces a phase change. The

invariance under phase changes allows the formulation of the theory to be

possible independently of the choice of phase. This invariance leads to the

conservation of the weak hypercharge.

Starting to call W1, W2 and W0 the three gauge �elds of SU(2) and so

W aµν (with a = 1, 2, 3) the �eld tensors of this group, and labeling Bµν the

�eld tensor of U(1), it is possible write the EWT Lagrange density as:

LEW =∑

families

ψ(iγµ∂µ)ψ−ψ(gW a

µ

τa

2+ g′BµY

)ψ− 1

4W aµνW

aµν− 1

4BµνB

µν

(1.5)

where τa are the Pauli matrices and Y the hypercharge.

The �rst term indicates the Dirac free Lagrangian for a massless Fermion

�eld; the last two are the kinetic parts related to the gauge �eld; the middle

terms represent the interaction part between gauge �elds and fermion �elds,

where g is the strenght of the SU(2) coupling and g′ the strenght of the

hypercharge coupling, related each other by the relation:

g sin θW = g′ cos θW = e

where e is the electric charge and θW the Weinberg angle, which indicates

the electroweak mixing between the gauge �elds W0 and B0 that give rise to

the physically observed neutral �elds Z0 and photon.

At this point, the theory of electromagnetic and weak interactions, as de-

scribed, is unsatisfactory due to the fact that it contains four massless gauge

bosons, W a and B, while experimentally only the photon is massless and the

others are observed with a mass of the order of 100 GeV. But while with

QCD fermion masses can be added "by hand" without any simmetry viola-

tion, here an explicit mass term (quadratic term in the Lagrangian) breaks

gauge SU(2) symmetry; moreover weak carriers must be massive to explain

the weakness and short range of the interaction. Is there a way to generate

12

the gauge boson and the fermion masses without violating gauge invariance?

The answer is in the so-called Higgs mechanism.

1.4 The Higgs mechanism

R. Brout, F. Englert e P. Higgs in the 1960s formulated an elegant theory that

introduced the masses of particles, requiring the presence of a new massive

scalar particle, called the Higgs boson. The mechanism with which a new

particle appears is commonly called the Higgs mechanism and it is related

to the spontaneous symmetry breaking (SSB): the system reaches a state of

minimum energy (vacuum state) in which part of the symmetry is hidden

from the spectrum. As a consequence, the gauge bosons became massive and

appear as physical states. This is possible by introducing the following gauge

invariant potential in the theory:

V (ϕ) = −µ2ϕ+ϕ+ λ(ϕ+ϕ

)2(1.6)

where µ2 and λ are both positive constants and ϕ is a scalar �eld, de�ened

as ϕ(x) =

(ϕ1(x)

ϕ2(x)

), with ϕ1(x) and ϕ2(x) are complex scalar �elds.

If µ2 < 0 then in the ground state the particle is situated in asymmetrical

positions of minimum: neither of these equilibrium positions, when taken

alone, shows the symmetry of the potential under SU(2) transformation; so

the symmetry is spontaneous broken. In other words, once an equilibrium

position is chosen, the symmetry of the potential becames hidden [2].

Expanding the potential around the minimum points, the problems with the

mass are solved because additional terms, representing the interaction be-

tween Higgs boson and gauge bosons and fermions, appear in the Lagrangian;

they are the so-called Yukawa terms. Computing the vacuum expectation

value of these terms, that in quantum mechanics corresponds to the ground

state of the system, it possible obtain the particle masses:

m2W =

%20e2

4 sin2 θW

m2Z =

%20e2

4 sin2 θW cos2 θW

mf = cf%0√

2

(1.7)

13

where ρ0 = µ2/λ is the vaccum expectation value of the Higgs �eld; cf is

an arbitrary phase, that without losing generality is required positive and

represents the phase transition below a certain energy, forseen by SSB; the

subscript f indicates fermion type.

By the interaction of the Higgs boson with itself appears also its mass term:

m2H = 2λρ20

and because its dependency on the λ parameter, its value is not predicted by

the theory and corresponds to the Higgs �eld self-coupling.

The Standard Model Lagrangian �nally can be obtained adding all men-

tioned contributions (Yukawa terms, kinetic terms and mass terms) and its

free parameters are in total 5: g, g′, ce, µ and λ or equivalently e, sinθW ,

me, m2W and m2

H , experimentally measured.

1.5 Standard Model re-discovery at CMS

Since main tool of this thesis for the investigation of the SM is the CMS

experiment at LHC (see Section 2.1 to mention its contribution to the direct

search of SM signals), it is rightful, for completeness, to report its direct

research contribution. The �gure 1.4 summarizes the "state of art" of its

researches.

The CMS program was extensive, with a wide range from the electroweak

sector (single W, Z production cross section and associated production) to

the strong sector (top quark production), up to the most competitive chal-

lenge: the Higgs boson existence.

ATLAS and CMS experiments announced the observation of 125 GeV Higgs

Boson in July 2012. The discovery has been con�rmed with the data collected

in the Run 2. Figure 1.5 shows the 4 leptons (muons, electrons) invariant

mass spectrum, obtained in H → ZZ search, with the data collected bt CMS

during the Run 2.

The �rst direct probe of fermionic couplings was the τ particle decay, which

was observed in the CMS results performed at the end of Run 1 [7]. During

Run 2, the energy increase of the center of mass and the broader data set

made it possible to probe other channels. Over the past year, proof of Higgs

decay in bottom quarks has been obtained and the production of the Higgs

14

Figure 1.4: Summary of measures of SM cross sections in Run 1 and Run 2

[5].

Figure 1.5: Mass spectrum of 4 leptons in the full mass range. The 3 peaks

are, in order of increasing mass, the decay of the Z, the Higgs boson

decaying into Z(l+l−)Z∗(l'+l'−) and di-boson production of two Z(l+l−).

Points with error bars represent the data and stacked histograms represent

expected signal and background distributions [6].

boson together with top quarks has been observed [8]. The most accurate

ways to summarize what we currently know about the interaction of the

15

Higgs boson with the other particles of the SM is to compare its interaction

strength with the mass of each particle, as shown in �gure 1.6.

Figure 1.6: Higgs boson coupling strength to each particle (error bars) as a

function of particle mass compared with SM prediction (blue dotted line)[9].

This clearly shows that the Higgs interaction force has an a�nity with the

mass of the particles: the heavier the particle, the stronger its interaction

with the Higgs �eld. This is one of the main predictions of the Higgs mech-

anism in the Standard Model. These couplings are in excellent agreement

with the SM prediction over a range covering 3 orders of magnitude in mass.

Deviations from these predictions could be a hallmark of new physics.

Recent highlights in the CMS research plan included the �rst observation

of electroweak production of same-charge W boson pairs [10], �rst evidence

for production of a top quark with a photon [11] and detailed investigations

of single top quark and top quark pair production as a function of the event

characteristics, that can be used to measure SM parameters, such as the top

quark mass and the strong coupling constant. The study is work in progress.

Most of the already published CMS results from Run 2 are based on data

recorded in 2015 or 2016, while the total number of proton-proton collisions

16

accumulated in Run 2 is more than three times higher. The large new dataset

will be used to expand the direct searches for new physics, in particular for

rare events with unusual signatures such as long-lived, heavy particles (see

next section to more details).

1.6 Open questions and Physics beyond SM (BSM)

The primary way to uncover physics (and new physics) is to explore the

high-energy frontier at colliders, where heavy new particles can be directly

produced and studied. The physics program at LHC in fact focuses on an-

swering fundamental questions in particle physics. What is the origin of the

masses of elementary particles? What is the nature of the dark matter we ob-

serve in the Universe? Are the fundamental forces uni�ed? Do the properties

of matter and antimatter di�er?

The hierarchy problem

The �rst of these questions seems to have found an explanation with the dis-

covery of the Higgs boson of mass ∼125 GeV, but at the same time anotherquestion rises, regarding the so-called hierarchy problem.

The SM introduces particle masses through SSB caused by the Higgs �eld;

within the theory, these masses are di�erent to the foundamental values (or

bare masses, the masses at the Planck energy scale) and quantum correc-

tions due to the presence of virtual particles are introduced to correct this

di�erencies: this prescription is known as renormalization. In the case of the

Higgs boson, these corrections are much larger than the current experimen-

tal mass and consequently the parameter of the bare mass of the Higgs must

be "�ne-tuned" in order to cancel such quantum corrections. However the

�ne-tuning level required in the SM is considered unnatural. Therefore new

physics is expected to be found at electroweak scales, about 102 GeV.

The uni�cation issue: SUSY and GUT

New physics must appear to mass not too far from 1 TeV to eliminate the

mass growth, mentioned in the previous section. This energy scale is that of

a powerful model of SM extension, called SUperSYmmetry (SUSY), which

predicts that every ordinary fundamental particle has a partner particle: a

bosonic partner to each fermion and vice versa. These SUSY partners take

17

an important balance in all the description of nature and solve the problem

of the particle hierarchy. In SUSY the Higgs sector is modi�ed so that the

properties of the Higgs boson can deviate from those expected in SM and

additional Higgs bosons are predicted.

The 1 TeV energy scale is accessible by LHC, but to date, no SUSY particles

have been observed and con�dence limits have been set on the production

cross section and mass.[12]. The search for SUSY below the TeV is one of

the physics goals of the next LHC runs.

Present day experiments do not perform test only to verify that the prop-

erties of the Higgs boson are in line with those provided by the SM, but

speci�cally look for properties that provide evidence of new physics. For

example, by constraining the branching ratio with which the Higgs boson

decays in combinations of invisible or unobserved particles, stringent limits

are given on the existence of new particles with masses lower than that of the

Higgs boson. So far, none of this research has found anything unexpected

[9]: the challenge is still ongoing.

Another theory proposed to extend the Standard Model is the Grand Uni�-

cation Theory (GUT).

The SM is constructed on the basis of a gauge group which introduces three

distinct coupling constants that have to be measured experimentally and can

change with energy according to a renormalizable theory as SM is, as seen

in Section 1.2. Thus a real uni�cation is avoided (see �gure 1.7 left).

The GUT foresees this uni�cation between EW and strong forces within more

complex gauge groups, SU(5), at scales of energy ΛGUT ∼1015-1016 GeV (�g-

ure 1.7 right) which are close to the scale of quantum gravity, ΛPlanck ∼1018GeV [1], where gravitational e�ects became important.

Regarding the above remaining questions, the SM does not provide answers.

It is a successful theory and so far it provided accurate predictions which

have been veri�ed experimentally over the last half century, at the elec-

troweak scale. Nevertheless, what happen beyond this scale is missing.

All these unanswered questions suppose that it could not be considered the

exhaustive theory of particle physics, but only a "low energy" approximation

to a more fundamental theory.

18

Figure 1.7: Evolution of the SU(3)ÖSU(2)ÖU(1) gauge couplings to high

energy scales, with renormalization group equations of the SM (left) and

the renormalization group of the SUSY generalization of the SM (right) [1].

Dark Matter

Cosmological and astrophysical observations show that SM describes only

15% of the total matter present in the universe, the remaining 85% is Dark

Matter (DM) [12]. Several candidates exist for the dark matter.

The favourite one is the Weakly Interacting Massive Particle (WIMP), an

electrically neutral, colorless, stable particle with a mass in the range of the

electroweak scale [13].

The DM-SM interaction can be probed in several complementary and in-

terdisciplinary ways: Direct Detection (DD) experiments look for evidence

of DM-nucleus elastic scattering; Indirect Detection (ID) experiments search

for SM particles from DM annihilation or decay and Detection at Collider.

The experiments at LHC in fact rely on detecting dark-sector particles pro-

duced in pp collisions, exploiting essentially the �EmissT + X� or �Mono-X�

signature, where X stands for SM particles that tag the event and EmissT is

the missing energy2 associated to the SM particle.

2In the presence of weakly interacting particles, such as neutrinos, which could then

escape the detection, the conservation of the total transverse momentum, requirement of

frontal collisions, could not be assured. The missing transverse energy (MET or 6 ET ) is

therefore introduced and de�ned as√(−∑pix)

2+(−∑piy)2, where the sum is extended

to all the particles detected in the collision and in the c = ~ = 1 frame energy and

momentum have the same dimension.

From an operational point of view, MET is evaluated as the vector sum in the transverse

19

There have been numerous searches for DM as WIMPs, at the LHC and else-

where, but no evidence for WIMPs has emerged [13]. This motivates looking

beyond the WIMP paradigm.

In general, dark sectors can feature new particles and forces with signatures

not found in the WIMP scenario: one such new particle is a �dark photon�.

A compelling dark-force scenario in fact involves a massive dark photon, A′,

whose coupling to the electromagnetic current is suppressed relative to that

of the ordinary photon, γ, by a factor of ε. Theoretically the dark photon

does not couple directly to charged SM particles; however, a coupling may

arise via "kinetic mixing" between the SM hypercharge and A′ �eld strength

tensors. This mixing provides a potential "portal" through which dark pho-

tons may be produced if kinematically allowed [14].

Another candidate of DM is the neutralino, the lightest SUSY particle, a

mixture of the superpartners of Higgs and electroweak gauge bosons. Par-

ticle physics considerations alone require the neutralino to be electrically

neutral, e�ectively stable and weakly interacting, with mass of order 100

GeV. Remarkably, these properties are consistent with the possibility that

the thermal relic density of neutralinos makes up most of the missing mass

of the universe.

No evidence for DM has been observed so far but there is much more phase

space to be explored [13].

Matter-antimatter asymmetry

Another fundamental physical phenomenon not explained by the SM is re-

lated to the matter and antimatter abundances in the universe. SM predicts

that should have been created in almost equal amounts, but the Universe

seems mostly made out of matter, practically without anti-matter. This im-

plies a particle-antiparticle asymmetry and suggests that CP (charge and

parity transformations) may not be a symmetry for all interactions. Despite

the phenomenological success that describes the charge conjugation parity

symmetry violation (CPV) [15][16] [17] in the SM, according also to exper-

imental results, one talks about little violation, not enough to explain the

observable asymmetry! Probably there must be additional sources of CPV

besides those currently known.

The recent evidences from solar and atmosheric anomalies, that prove neutri-

nos have �nite masses, suggest another possible source of CPV in the lepton

plane of the energy deposits detected by calorimeters.

20

sector, but the quantity remains low. Superkamiokande experiment [18] in

Japan in fact has observed the oscillations of νµ ↔ ντ , that can happen

only if the neutrinos of di�erent �avors have di�erent masses. However, in

the current SM formulation, the neutrino masses and the lepton mixing an-

gles (described by the so-called Pontecorvo�Maki�Nakagawa�Sakata -PMNS-

matrix) cannot be included as parameters. Actually this is an experimental

result that cannot be explained by the SM.

Neutrinos oscillations open to lepton �avor violation (LFV) channels and

involve searches for new physics in rare or highly suppressed �avor chang-

ing neutral current reactions. Generally, "family" or �avor number is not

a symmetry of Lagrangian like charge: quark family number is violated

in weak decays and this phenomenon is described by a 3x3 matrix called

Cabibbo-Kobayashi-Maskawa (CKM) matrix. Morover, as just mentioned,

it's violated in neutral leptons. Nevertheless, it has never been observed in

the charged lepton sector. Most "natural" new physics models predict one

should have seen it already, even if small. So the rates for Charge LFV

(CLFV) processes are expected to provide information regarding the nature

of new physics.

1.6.1 Low momentum muons as a tool to probe New

Physics

Very soft muons, with pT < 3-4 GeV, can considered as a an interesting

probe to test the presence of new physics. In this paragraph I will discuss

two physics cases that triggered the interest of the scienti�c community,

where low momentum muons play a crucial role.

o Charged Lepton Flavor Violation τ −→ 3µ

The leptonic �avour conservation is so far a phenomenon experimentally ob-

served without a respective symmetry in the SM. There are no known sym-

metries that strictly forbid lepton-�avor violating decays, such as ` → `′γ

or ` → 3`′. In the SM, due to neutrino oscillations, such decays are possi-

ble, albeit with extraordinarily small branching fractions. At the LHC, the

τ → 3µ decay is one of the �cleanest� LFV decay channel. The currently

best experimental upper limit, set by Belle [19], is B(τ → 3µ) < 2.1 x 10−8

at 90% CL and BaBar [20] =⇒ BR(τ → 3µ) = 3.3 x 10−8 [21].

The main motivations that focuses our attention on this τ dacay type is that

21

the τ → 3µ decay has the advantage of a very clean �nal state topology; its

BR is kinematically advantaged and seems enahnced in some BSM models.

Morover it can be studied at pp colliders, indeed LHC is a factory of τ lep-

tons: they are here produced in abundance O(1011)/fb−1, as shown in table

1.1.

Table 1.1: The expected inclusive number of τ leptons produced in D and B

meson decays at LHC. Numbers are from PYTHIA [22].

Process number of tau leptons (33fb−1)

pp→ cc+ . . .

D→ τv 4.0× 1012 (95%Ds, 5%D±)

pp→ bb + . . .

B→ τv + . . .

B→ D(τv) + . . .

1.5× 1012(44%B±, 45%B0, 11%B0S, 0%B±C)

6.3× 1011(98%Ds, 2%D±

)The dominant sources at LHC are the Ds and various B mesons; the W and

the Z production sources will provide considerably less (8 x 108) number per

year, but more energetic.

Search for τ → 3µ at colliders has carried out and upper limit have been

set: LHCb (Run 1) =⇒ BR(τ → 3µ) < 4.6 x 10−8 and ATLAS (Run 1) =⇒BR(τ → 3µ) < 3.8 x 10−8 at 90% CL.

Until the beginning of this year, the great absentee in this panorama was

CMS. In March 2019 its �rst public result was released: with 2016 data

(Run 2 at 33 fb−1 integrated luminosity) it was obtained BR(τ → 3µ) < 8.9

x 10−8 at 90% CL [22], using the heavy �avour channel (B and D decays).

The pT spectrum of muons coming from τ decay in the HF channel is partic-

ularly soft (as will see in the Chapter 4) and bosted in the forward direction.

The situation for τ → 3µ, although more promising, is still very challenging.

The present work has been developped in this context. I studied the back-

ground composition and I provided an approach to discriminate these events

from signal events, taking advantage of the most recent multivariate analysis

techniques. The study is presented in the Chapter 4 of this Thesis.

o Muons as a tool to test SUSY and Dark sector

Recently CMS, ATLAS [23] and LHCb [14] experiment published results

about the search of new light bosons with the mass in the range 0.25 - 8.5

22

GeV/c2 [23], whose presence could be interpreted as a dark matter candidate

(dark photon, γD) or a �rst hint of the lightest scalar foreseen by the Higgs

sector of the some extensions of the SUSY models. The search looked for

the decay of the new boson into 4 muons, characterized by a very soft pTspectrum.

Also in this context, the detection and identi�cation of soft muons, plays a

crucial role and the muon identi�cation developed in the present work could

be and additional tool for this kind of search.

This search has been performed on the data collected by the di�erent ex-

periments in 2016. So far no excess is observed in the data, and a model-

independent upper limit on the product of the cross section, branching frac-

tion and acceptance is derived [23].

Nevertheless, this searches are still ongoing using the 2017/18 datasets and

are one of the physics goals of the next LHC runs.

23

Chapter 2

The CMS experiment at LHC

The Large Hadron Collider (LHC) [24] located in the world's largest particle

physics laboratory, European Organization for Nuclear Research (CERN) is

the most powerful particle accelerator built to date. Many scienti�c success

have been achieved after the start up, but many open challenges still remain.

The Compact Muon Solenoid (CMS) is one of the main experiments at LHC,

designed primarily for the search of new physics signature in proton-proton

collision (pp).

This chapter is dedicated to the description of the CMS detector and its

operational strategy, with greater emphasis on the muons reconstruction and

identi�cation techniques, main topics of my thesis..

2.1 The Large Hadron Collider

The LHC is an hadron circular collider, designed with the main goal of op-

erating at TeV energy scale, corresponding to the energetic scale of the elec-

troweak symmetry breaking and of the Higgs mechanism.

2.1.1 Technical characteristics

The LHC was built in the tunnel that hosted the LEP electron-positron

accelerator: a 27 km long ring-shaped tunnel at a depth of about 100 m

underground. Two high-energy proton beams travel in opposite directions at

speeds close to that of light before colliding, inside two tubes, called beam

pipes in which the ultra-vacuum is applied.

The protons in the beams are distributed in "packets" or bunches, about 5 cm

24

long and transverse dimensions of 10 µm and separated by 25 ns time steps,

corresponding to 750 cm in spatial separation. The collision of two packets

is called bunch crossing (BX) and takes place at a rate of 40 MHz [10]. The

beams are focused using a total of 1232 dipoles and 392 quadrupoles built

with superconducting magnets cooled with liquid helium at a temperature T

= -271.3 ° C (1.9 K), colder than that of the open space (2.7 K).

The proton beams reach the LHC ring after a series of pre-accelerators car-

rying the protons to a 450 GeV energy in three steps: the LINAC (LIN(ear)

AC(celerator)) brings them to an energy of 50 MeV; the Proton Synchrotron

(PS) further accelerates them to an energy of 26 GeV and �nally the SPS

(Super Proton Synchrotron) injects them in LHC with energy of 450 GeV[25].

The two proton beams then collide at four points along the ring, correspond-

ing to the positions of the four main LHC detectors/experiments: ATLAS (A

Toroidal Lhc ApparatuS ), CMS (Compact Muon Solenoid), ALICE (A Large

Ion Collider Experiment) and LHCb (Large Hadron Collider beauty). Next

to these there are 3 other minor experiments for a total of 7 experiments.

In the �gure 2.1 a complete diagram of the beam injection system is shown.

Figure 2.1: Scheme of LHC and its experiments.

The number N of events per unit of time produced in pp collisions is ex-

pressed as: N = σ · L, where σ represents the cross section of the particular

process investigated and L the instantaneous luminosity (number of collisions

provided per unit of time). Assuming two proton beams with n bunches and

25

with respectively N1 and N2 protons per bunch, colliding at a frequency f,

the luminosity at LHC is de�ned as:

L =f · n ·N1 ·N2

4π · σx · σy

where σx and σy denote the beam pro�les on the transverse plane of the

bunches.

In order to achieve the project instantaneous luminosity of L = 1034cm−2s−1

each beam must be composed of 2808 bunches with 1.15 Ö 1011 protons per

bunch, separated from each other by fBX = 40 MHz.

2.1.2 LHC between past and future

The LHC was designed to accelerate proton beams up to 7 TeV, which would

have resulted in a centre-of-mass energy available for collisions of 14 TeV at a

maximal instantaneous luminosity of 1034cm−2s−1. The required design has

driven the choice of a proton-proton collider, instead of a proton-antiproton:

even if a p-p machine has the advantage that both beams can be kept in

the same beam pipe, the production of the number of antiprotons needed to

reach the desired luminosity is an unfeasible task.

LHC started operating in 2009. After an activation phase at 0.9 and 2.36

TeV, the �rst real data collection periods took place in 2010-2011 with pp

beams at a Ecm = 7 TeV and continued in 2012 at a Ecm = 8 TeV.

This con�guration, which is labeled "Run 1", lasted until 2012, the year in

which it was necessary to suspend operations (Long Shutdown 1 o LS1), in

order to prepare for a second phase: the "Run 2" in fact labels a period

of activity 2015/2018, following a series of improvements with the aim of

increasing the center of mass energy to 13 TeV, with an instantaneous lumi-

nosity of 8 x 1033cm−2s−1, about 40 times greater than the previous period,

and then bringing it to almost 2 x 1034cm−2s−1 (see the �gure 2.2).

In December 2018 another long interruption of operations began (LS2), to

prepare LHC to operate at an instantaneous luminosity further increased to

2 x 1034cm−2s−1. In this way a third operational phase called "Run 3" will

open and will cover the period between 2021-2023, following which a further

increase in luminosity is expected (3000 fb−1, see �gure 2.3) in what will be

called High Luminosity LHC or HL-LHC, which should start in 2026. This

will be possible thanks to a series of modi�cations regarding principally the

replacing of the linear accelerator Linac 2 with the new Linac 4 and more

26

(a)

(b)

Figure 2.2: (a) Evolution of the peak luminosity at LHC between

2010-2018. (b) Total integrated luminosity during the year 2010-2018 at

nominal center-of-mass energy [26].

than 20 magnets with niobium-tin (Nb3Sn) coils: a very fragile material

but able to withstand higher magnetic �elds than the niobium-titanium wire

(Nb-Ti) used in the LHC's magnets.

One of the main challanges for the LHC experiments, related to the change

of the operation condition in Run 3, will be dealing with the increased pile-up

(weak pp interactions with low transverse momentum particles, see Section

2.3) and background and the possible deterioration of the detectors, caused

by the very high level of radiation.

27

Figure 2.3: Instantneous and integrated luminosity designed at LHC up to

2037 [27].

2.2 The CMS Experiment

The Compact Muon Solenoid experiment (CMS) is a general-purpose detec-

tor (built for the investigation of a wide �eld of physics), with a cylindrical

geometry and forward-backward symmetry along the beam line. It covers

an extensive research program within the boundaries of the Standard Model

(including the search for the Higgs boson(s) in all decay channels, etc...) and

beyond (supersymmetry, extra dimensions and dark matter). It shares the

same objectives as the ATLAS experiment, but di�ers from it in many details,

thus avoiding systematic errors if only one single technique was used for both.

Coordinate system: conventionally CMS uses a right-handed coordinate

system, with the origin at the nominal point of the collision: the x-axis point-

ing to the center of the LHC ring, the y-axis pointing upward, perpendicular

to the LHC plane, and the z axis along the counterclockwise direction of the

beam, toward the Jura mountains from LHC Point 5. The azimuthal angle

φ is measured from the positive x axis in the x-y plane, while the polar angle

θ from the z axis in the y-z plane.

The interaction is commonly described by Lorentz invariant quantities under

longitudinal boosts, then the energy and the momentum of the outcoming

particles in the transvere plane to the beam direction are dominant and are

denoted by ET and pT . For the same reason, a commonly used spatial co-

28

ordinate is the pseudorapidity η = −ln[tg(θ/2)], a good approximation of

the rapidity y, an invariant quantity for Lorentz transformations along the

axis of beams, more useful in the hadron collider studies [25] and de�ned as

y = 12

ln(E+pzE−pz

), where E is the particle's momentum and pz is the compo-

nent of its momentum projected along the beam axis.

2.2.1 The detector

The CMS detector is shaped like a cylindrical onion, 21.6 m long and 14.6

m in diameter, with several concentric layers of detectors, built around the

beam pipe. Each detector is designed to detect a certain type of particle,

measure the momentum or the energy.

Figure 2.4: Perspective representation of the CMS detector, together with

all its components.

As shown in �gure 2.4, starting from the outermost to the heart of the detec-

tor, the di�erent detection systems are: muon chambers (dedicated to muon

detection), hadron calorimeter HCAL (designed to measure the energy of the

hadrons by stopping them, it surround the collision and prevent the particles

from "run away"), electromagnetic calorimeter ECAL (measures the energy

29

of electrons and photons) and �nally central tracker (to provide accurate

momentum measurements).

However the fulcrum of the whole apparatus is represented by a huge solenoid

magnet, which gives its name to all the detector: a cylindrical coil of super-

conducting wire (cooled to -268.5 °C), generating a 3.8 T solenoid magnetic

�eld oriented along the beam line (CMS was designed to produce a magnetic

�eld of up to 4 T in the inner region. In order to maximize its lifetime, the

magnet is operated at 3.8 T). The high nominal value of the �eld is a nec-

essary prerogative for the determination of the sign of the high momentum

particles. In fact, given the considerable dimensions of the solenoid, 13 m

long and 7 m in diameter, it is possible to place the innermost detection sys-

tem (calorimeters + tracker) inside it, to be able to exploit the high magnetic

�eld for accurate measurement of momentum also for higher-impulse parti-

cles, such as muons. Since greater the momentum of the particle smaller the

curvature in the magnetic �eld that the track undergoes, outside the solenoid,

where the muon chambers are located, the magnetic �eld is con�ned from the

iron return yoke and still allows to track the muon trajectories. This mag-

netic structure allows measuring the momentum inside the solenoid (tracking

devices) and outside it (muon chambers), exploiting the well known relation

p[TeV ] = 0.3 ·B[Tesla] ·R[km], with R being the radius of the curved track.

Inner tracking system

The tracker, called inner tracker, occupies the innermost position of the en-

tire detector and covers a range of |η| <2.5 [28], as shown in �gure 2.5.

Its main task is to accurately and e�ciently measure the trajectories of the

charged particles produced in pp collisions and to reconstruct the secondary

vertices and impact parameters for the identi�cation of heavy hadrons in an

environment with high radiation levels. These requirements are met thanks

to a high granularity and a fast response of their sub-detector (silicon de-

tectors), keeping the quantity of material as low as possible so as to limit

phenomena such as multiple scattering, bremsstrahlung and nuclear interac-

tions.

It consists of a heart of 65 million pixel detectors, directly exposed to the

high intensity of particles: at 8 cm from the beam line the rate is ∼10 millionparticles per square centimeter per second. Therefore it holds a vital place

in the reconstruction of very short-lived particles (such as the beauty.

When a charged particle passes through a pixel, it supplies enough energy to

30

Figure 2.5: Schematic view of the CMS tracking detector [29].

the electron of the Silicon atom to be excited, creating an electron-hole pair.

Each pixel uses an electric current to collect these charges on the surface as a

small electrical signal, recorded by an electronic chip. Knowing which pixel

has been "touched" we can reconstruct the particle track. Since the detector

is made of 2D pixels and has a certain number of detection layers (3 layers in

all), it is possible to create a 3D image of the track. Silicon pixels provide a

measurement accuracy of 10 µm for the coordinates in the transverse plane

(x-y), and 20 µm for the coordinate z. The spatial resolution is around 10

µm for measures R-φ and 20 µm for the measures z [25].

Immediately after the 3 layers of pixels, the particles pass through 10 layers

of strip detectors, reaching a radius of 130 cm. The strips work in a similar

way to the pixels and the collected charge in the form of an electrical pulse

is ampli�ed and read. The microstrips provide a resolution that depends on

the thickness of the cell, but it is still better than 55 µm in the transverse

plane: in particular the resolution in a single point varies from 35-52 µm in

direction R-φ and 52 µm in z [25].

The inner tracker as well as its electronics are constantly exposed to high ra-

diation but are designed to withstand it for ten years. However, to minimize

the disturbances, this part of CMS is kept at -20 °C to "freeze" any damage

and prevent its propagation.

During Run 1 some dynamic ine�ciencies were observed in the pixel sub-

detector, of the order of about 5% in the �rst barrel layer, due to the limited

size of the readout bandwith [30]. A new pixel detector was installed in

31

March 2017, equipped with one additional barrel layer and one additional

forward disk, as shown in �gure 2.6: the innermost layer is closer to the

interaction point and the outermost one is further away from it.

Figure 2.6: Schematic drawing of the upgrade of the pixel detector. The

lower half shows the old pixel detector, the upper half represents the newly

installed detector [31].

An improvement of the e�ciency is observed with the new pixel detector

with respect to the 2016 data and the results are reported in �gure 2.7.

The electromagnetic and hadronic calorimeters

The calorimeter in general has the main task of measuring the energy of the

particles passing through it, by means a destructive process of the particle:

it interacts with the sensitive calorimetric material and produces a shower

that propagates and stops in the whole structure, allowing the collection of

a signal proportional to the energy of the starting particle.

Electrons and photons, from prompt interactions or embedded in an hadron

or τ -jet, stop within the electronmagnetic calorimeter (ECAL), precisely be-

cause their main interaction channel is electromagnetic.

In CMS, the ECAL is an homogeneous calorimeter composed of lead tungstate

crystals (PbWO4), very dense (8.28 g/cm3), fast scintillating (80% of the light

is emitted within 25 ns). The cell size is 23 cm long (25.8 radiation lenght1)

in the radial direction to ensure the complete development of the showers: so

very compact, granular and radiation-resistant detectors are produced [25].

Since these crystals are mainly made of metal, but with the presence of oxy-

gen, they result highly transparent and "sparkly" when crossed by photons

1 The depth at which the electron energy is reduced by a factor 1/e, due to the

bremsstrahlung alone.

32

Figure 2.7: Hit e�ciency of the pixel detector vs instantaneous luminosity

during during Run 2 measured with 2017 data [30].

or electrons: in fact when these particles hit the heavy nuclei of the ECAL

crystals, they excite their electrons. When the atoms "relax", they emit pho-

tons of blue-green wavelength with a maximum range of 420-430 nm, which

are collected and ampli�ed by the photo-detectors, glued to the back of each

crystal.

The electromagnetic calorimeter forms an intermediate layer between the

tracker and the hadronic calorimeter: in this last one the energy of the par-

ticles, which are instead subjected to strong interaction, is deposited.

In CMS, HCAL is a sampling calorimeter, with layers of sensitive material

(�uorescent plastic scintillator), alternated with layers of dense steel or brass

absorbers. When an hadron crosses the absorber, it strongly interacts with

the nuclei in the medium and the production of secondary particles which

in turn interact, forming the hadron shower. As the cascade develops, the

particles pass through the scintillator layers, causing the emission of a light

typically with a blue-violet wavelength. The used scintillator is able to shift

it into the spectral region of the green, to which the photo-cathode windows

which convert it into an electrical signal are most sensitive.

33

As the HCAL is constructively thicker than the ECAL, since the hadronic

shower are wider, the minimum quantity of material needed to contain the

cascade is about 1 m. In order for a similar structure to be hosted in a

"compact" detector like CMS, it is organized in 3 sections: barrel, endcap

and forward section. The barrel are in turn divided so that some layers con-

stitute the last detection layer inside the solenoid, while few other layers are

external to it, the so-called outer calorimeter HO, and improve the contain-

ment of hadronic showers. Finally, two forward sections (HF), are positioned

at both ends of CMS, at about ± 11 m away from the interaction point,

to collect the myriad of particles from the collision region at shallow angles

compared to the beam line. They are Cherenkov-Based iron/quartz �ber

calorimeters. The Cherenkov light emitted in the quartz is revealed by the

photomultipliers. HF ensures full geometric coverage for transverse energy

measurements in the event [32].

The muons system

The only particles that manage to go beyond HCAL are two: weakly in-

teracting particles, such as neutrinos, and muons. While neutrinos escape

detection and can be identi�ed by means of missing energy measurements,

muons are charged particles which are then tracked in the appropriate muon

chambers and whose momentum is extrapolated from the curvature of their

tracks in the solenoid.

The muon system covers a region of pseudorapidity of |η| < 2.4 and has

three main tasks: trigger the muons, identify them and improve charge and

momentum measurements for high pT .

As the muons are the protagonists of the thesis study presented here, a more

in-depth section is dedicated to their detection system in next Section.

2.2.2 The CMS muon system

As the same acronym suggests, the most important task of CMS is the de-

tection of muons, which give a very clear signature od several new physics

processes. Muon detection is then a powerful tool for recognizing signatures

of interesting processes over the very high background rate expected at LHC.

Since muons can penetrate several meters of iron before interacting, unlike

other particles they do not stop in calorimeters. This is why muon chambers

are placed in the outer layers of the cylinder.

34

The basic detection process used in the CMS muon system is gas ionization,

exploited using di�erent technologies: drift tubes (DTs), cathode strip cham-

bers (CSCs) and resistive plate chambers (RPCs), chosen on the basis of the

expected background2 and the magnetic �eld. In �gure 2.8 is clear that the

background is higher in the endcaps where the magnetic �eld is less intense.

Figure 2.8: Magnetic �eld mapping (left) and its �eld lines (right) provided

for a longitudinal section of CMS using a magnetic �eld model 3.8 T[33].

Each of these detectors has a basic physical module called "chamber".

The chambers are units that operate independently and are in total 1400:

250 DT and 540 CSC track the position of the particles and provide a trig-

ger, while 610 RPC form an additional trigger system that promptly decides

whether or not to keep the data of the acquired muon.

The use of these 3 di�erent technologies makes it possible to distinguish in

the detector 3 di�erent regions, naturally de�ned by the cylindrical geometry

of CMS: barrel in correspondence with |η| < 0.9, overlap with 0.9 < |η| <1.2and endcap at 1.2 < |η| <2.4.

The DTs are placed in the barrel, while the CSCs are the endcap disks that

cover the ends of the barrel. The RPCs are divided between the two regions

and are interspersed with the DTs and CSCs. All chambers are arranged to

maximize coverage.

The term "station" refers to a set of chambers around a �xed value of the

2The background is mainly induced by the interaction of neutron gas (created by

hadrons interaction with the material of the beam pipe) that produce mainly photons,

electrons and positrons.

35

Figure 2.9: Transversal section of a CMS quadrant, in the R-z plane. The

interaction point coincides with the zero of the axis system[34].

radial distance R (in the barrel) or of the distanze along the z direction (in

the endcap). So, as it can see in �gure 2.9, there are 4 stations in the barrel,

MB1-MB4 (orange areas), and 4 in the endcap, ME1-ME4 (green areas), in-

terspersed with steel discs (dark gray areas). The RPCs (blue zones) that are

placed in the barrel are indicated with RB1-RB4, while in the endcap with

RE1-RE4. Along z, DT and RPC in the barrel are divided into 5 "wheels":

Wheel 0 is centered at z = 0, while Wheel W + 1 and W + 2 in the positive

direction of the axis and W-1 and W-2 in the negative one. In the same

way, in the endcaps CSC and RPC are organized in "rings", indicated with

ME1/n-ME4/n, where n increases with the radial distance from the beam

line. In �gure 2.10, the amount of material thickness crossed by muons, as a

function of pseudorapidity, is shown.

Between Run 1 and Run 2 additional chambers were added in ME4/2 and

RE4 to increase redundancy (allowing the system to tolerate the loss or fail-

ure of some component), improve e�ciency and reduce the mis-identi�cation

36

Figure 2.10: Material thickness in interaction lengths at various depths, as

a function of pseudorapidity[29].

rate. Furthermore the trigger and the read-out electronics have been im-

proved as part of a larger updated trigger, includeing optical connections

between DT and CSC to increase bandwidth and facilitate maintenance; a

deep segmentation has been implemented in the scintillator layers of the

calorimeters using SiPM [32] [10].

DT - Drift Tube chambers

The DT chambers are located in the barrel, where the magnetic �eld is mostly

uniform, with a force less than 0.4 T between the segments of the return yoke

and in addition the rate of muons is low: they cover a region of pseudora-

pidity of |η| < 1.2 and are subdivided into 12 φ-segments which make up 4

stations. Each station, on average 2x2.5 m, consists of 12 aluminum layers,

organized in three groups of four consecutive layers, each with a maximum of

60 tubes, called SuperLayer (SL). Among this three groups, the intermediate

one measures the coordinate along the direction parallel to the beam line,

i.e. in the longitudinal plane (R-θ), while the two remaining SLs measure

the perpendicular coordinate in the curvature plane (R-φ). However, the

chambers in MB4 have only these last two groups. A honeycomb structure

separates an R-φ SL from the other two SLs (�gure 2.11).

These measurements are possible by exploiting the ionization process in the

gases. Each tube, 4 cm wide, contains a gold-plated anodic stainless steel

37

Figure 2.11: DT chamber layout in a barrel station[25].

wire, at a voltage of +3600 V, within a volume of gas (85%/15% Ar/CO2

[33]). When a muon passes through it, it ionizes the gas atoms thus pro-

ducing electron-ion pairs, which move following the electric �eld lines. The

electrons of the pairs end up on the positively charged wire and the induced

signal in the electronic is read. Recording the hit of the electrons along the

wire and calculating the distance of the original muon from the anode wire

(exploiting the knowledge of the electron drift time3, the DTs provide the

two coordinates for the position of the muons (muons hit).

The maximum drift length is 2.0 cm and the resolution in a single point is

about 200 µm[25].

CSC - Cathode Strip Chambers

The cathode strip chambers provide fast response times (due to the small

drift paths) and they can be �nely segmented, tolerating the non-uniformity

of the �eld with good tolerance, for these reasons they are located in the disks

at the ends of CMS, where the magnetic �eld is stronger and non-uniform

and the frequency of muons arrival is very high. They cover a region of pseu-

dorapidity 0.9 < |η| <2.4.Each endcap has 4 chamber stations mounted perpendicular to the beam.

Trapezoidal in shape (�gure 2.12), they consist of 6 layers of positively

charged anode wire arrays ( 2.9 kV for ME1/1 and 3.6 kV for the remaining

ones), crossed with cathodic charged copper strips negatively, within a vol-

ume of gas (50% CO2, 40% Ar e 10% CF4 [33]).

3Its value depends on the knowledge of the electron drift speed, in turn linked to the

characteristics of the gas; in this speci�c case it has speeds of 55 µm/ns and maximum

drift times of 400 ns [33]

38

(a) (b)

Figure 2.12: Schematic view of a CSC chamber[25] (a) and orthogonal

sections of one CSC layer (b).

A muon crossing the gas volume produces ionization charges which, moving

toward the electrodes, create an electron-ion avalanche. It produces a charge

on the anode wire and an image charge on a group of cathode strips. Since

the strips and wires are perpendicular, it is possible to obtain the position of

the particle in two coordinates: the cathodic strips provide a measurement

in the plane of curvature (R-φ), so they provide the azimuthal position at

which the muons intersect the gas volume, while anode wires provide a mea-

surement in the radial direction.

The spatial resolution provided by each chambers is typically around µm[25].

CSCs of di�erent sizes are used, ranging between 1.7-3.4 m in the radial di-

rection and, while the CSCs in the innermost rings of the stations 2.3 and 4

subtend an angle of φ = 20°, all the others subtend an angle of 10°.

DT and CSC together cover a pseudorapidity interval equal to |η| < 2.4,

guaranteeing a good identi�cation of muons over a range of 10° < θ < 170°.

Their crucial property is that they can identify the bunch crossing that gen-

erated the muon and trigger on the pT of the muons with good e�ciency, and

that they have the ability to reject the bottom by means of a temporal dis-

crimination. Since the minimum time interval between two bunch crossing is

25 ns, the ability of muon chambers lies in providing a rapid and well-de�ned

signal to trigger muons track: to ensure an unambiguous identi�cation of the

bunch crossing and a temporal coincidence between the segments of tracks

among the various muon stations, the signals must have locally a temporal

39

dispersion of a few ns, much smaller than the above 25 ns! This is what

happens in CMS.

RPC - Resistive Plate Chambers

RPCs, both in the barrel and in the endcaps, are fast gas detectors, with

excellent temporal resolution (1 ns), located in a pseudorapidity region |η|< 1.6. They are double gap chambers (see �gure 2.13), so consist of two

pairs of parallel electrodes (anode and cathode) of a very resistive plastic

material (graphite Bakelite), separated by a volume of gas (95.2% Freon,

4.5% isobutano e 0.3% SF6).

Figure 2.13: Schematic view of a double gap RPC structure. The read out

strip in the barrel region chambers are in the beam direction[25].

When a muon passes through the chamber, the electrons are ripped from

the atoms of the gas and in turn hit other atoms causing an "avalanche" of

electrons. It is said that they are in the avalanche mode, i.e they prevent the

formation of the streamer and are suitable to work even at higher rates (up

to 10 kHz/cm2[25]).

RPCs provide fast responses with good temporal resolution but with a coarser

spatial resolution than CSC and DT. They therefore allow unambiguously

to identify the correct bunch crossing; they also represent a trigger system

for muons complementary to that of DT and CSC, providing information

on the timing, since it is required that in multiple chambers the hits are

simultaneous. The time allocated to the muons hits, once the event has

been collected, is called "o�ine time": for muons traveling at the speed

of light, produced in pp collisions and with the correct assignment of the

BX, the "o�ine time" of the hit rate of each chamber should not be t = 0.

Each deviation from 0 could be caused by background such as cosmic rays,

40

background of the beam, noise of the chambers, pile-up or it could also be

an indication of new physics for example for a heavy charged particle moving

slowly. More details are provided in Ref. [34].

2.2.3 The trigger system

The proton-proton collisions at LHC are spaced by 25 ns corresponding to

approximately 20 simultaneous collisions pp at LHC design luminosity (1034

cm−2s−1). It is therefore impossible to store and process, in such a small

interval of time, a such large amount of data associated with the high num-

ber of events collected. The maximum data rate that can be managed by

the DAQ acquisition system is approximately 100 GB per second. However,

the rate of events is largely dominated by the pile-up events. Therefore it

becomes necessary to drastically reduce the rate. This task is guaranteed by

the trigger system, also known as the "online" selection system.

The CMS experiment uses 2 trigger levels, to decide whether an event can

be provisionally accepted or rejected, using information from sub-detectors.

The �rst trigger level or L1, composed of dedicated hardware systems (ASIC,

FPGA), selects events of interest and reduces the read-out rate from 40 MHz

to a minimum of 100 kHz, the upper limit imposed by the read-out electron-

ics of CMS. Given the very short time that L1 has to make the decision (3.2

µs), it cannot receive information from the whole detector, for this reason it

uses only the information of the calorimeters and the muon chambers, which

are the fastest.

It is therefore clear that the CMS constituents act themselves as triggers,

called primitive triggers. They are generated at the level of the front-end

electronics (FE) of the sub-detectors and processed in di�erent steps before

being combined into a single information, evaluated in a global trigger that

takes the �nal decision. In particular for the muon system, the DT and CSC

FE triggers identify the track segments from information on the hit (that is

the point of passage of the particle on one of the detector layers) recorded in

the di�erent detection layers. This operation is done by pattern recognition

algorithms which identify the candidate muons and measure their momen-

tum from their curvature in the magnetic �eld of the return yoke between

measurement positions. At the same time, the RPC triggers directly exploit

the information of the hits: they are directly sent by the FE to logic cards

that compare models and identify the candidate muon.

41

The second trigger level is called high level trigger (HLT) and is based on

software criteria: it further re�nes the purity of physical objects and reduces

the rate of events collected around 400 Hz [35], using information from the

whole event. The selection is made in a similar way to that used in o�ine

analyzes: for each event, objects such as leptons, photons and jets are recon-

structed and identi�ed using criteria that select only those events that are of

possible interest for data analysis, as it will be discussed in the next chapter.

2.3 The Run 2 data taking conditions

The LHC was designed to produce on average 25 of such pileup interactions,

however, during Run 2 the LHC was able to surpass this goal with on av-

erage 32 interactions in the years 2017 and 2018, and with more than 50

interactions in short periods. When the bunches of protons collide, multiple

protons interactions occurs. The particles from the interaction of interest

are thus recorded by CMS together with particles from multiple additional

interactions, so-called pileup interactions. Separating out particles from an

interaction of interest from several hundreds of particles produced in the

pileup interactions is one of the challenge for every physics analysis of CMS

data and indeed, is one fo the topic of the present work [36].

Interactions are distributed along the collision region on several cm in z; the

CMS tracker has a resolution in z better than 1 mm and usually associates

the correctly loaded tracks with the individual separated vertices[32]. Nev-

erthless some confusion and overlap between di�erent collision processes is

inevitable. However fortunately softer interactions deposit very little energy

in CMS calorimeters, while many of the events of real physical interest de-

posit larger quantities. Then discrimination by transverse energy thresholds

can be a valid strategy and requires a clear separation of high-energy deposits

in calorimeters from lower energies in the surrounding regions. By improving

the calorimeters the performance of the trigger can be improved and this was

one of the �rst objectives of the upgrade.

The pile-up that we have discussed derives from interactions in the same

crossing as the interesting triggered event and it is called "in-time-pile-up",

due to additional interactions in the current bunch crossing, which of course

depends on bunch spacing. These are low pT particles and are the great

source of hits in the tracker and also produce signi�cant energy deposits

in calorimeters. With "out-time-pile-up" instead it refers to two sources:

42

"early" pile-up in reference to the energy lost in the calorimeters for BX pre-

ceding the same crossing of interest; "late" pile-up that refers to the energy

from the successive BX [32][10]. The in-time pile-up can be observed in a

single bunch crossing by the many collision vertices that are reconstructed

by the tracking system. The updated tracking system can be designed with

additional segmentation to always associate more charged particles with the

correct interaction points; out-of-time pile-up can happens because the in-

trinsic response of the sensor or of the electronics is greater than the 25 ns

between a certain bunch crossing and the other. If the occupation of a given

channel is small, it is unlikely that there is another particle passing through

it in a time close enough to contaminate the triggered bunch. So increasing

the segmentation of the detector is a way of �ghting this type of pile-up. An

other sources of "out-time-pile-up" includes signals from very slow particles,

such as neutrons in particular, which have scattered many times in the de-

tector and could possibly deposit energy in an active element.

In �gure 2.14 is sohwn the mean number of interactions per bunch crossing

for the 2016/2017/2018 pp run at 13 TeV.

2.3.1 Future Persectives

With the perspective of Run 3 and the future HL-LHC, more attention must

be placed to the damage due to the high ionizing radiation, as well as to a

rise of the background rate. The main source of radiation is from the par-

ticles produced in the pp collisions: the charged particles, especially pions,

produce ionization in the detectors they pass through. They also undergo

nuclear interactions that produce cascades of particles that add up to the

radiation load. The photons, mainly deriving from the decay of π0, interact

in the tube material in which the beam is contained or in the tracking sys-

tem to produce pairs e+e− or reach the calorimeters in which they produce

electromagnetic cascades.

The particles can also be scattered backwards by calorimeters or escape the

calorimetric shower: they interact with other components of the detector.

When neutrons interact, they can also produce photons and electrons: a uni-

form background of very low energy neutrons, electrons and photons within

the detector volume is created, which have completely lost the correlation

with the original collision bunch structure. The update will therefore pro-

vide that in the ECAL the crystals will operate at lower temperatures to

reduce radiation damage and the reading electronics will be replaced.

43

(a) (b)

(c)

Figure 2.14: Pile-up distribution for the 2016(a) 2017(b) 2018(c) run, using

the "CMS recommended" value of minimum bias cross section, 69.2 mb [26].

The produced radiation damage varies from one sub-detector to another. For

Silicon detectors, the radiation produces defects in the silicon lattice result-

ing in a change in electrical properties. In the crystal calorimeters of PbWO4

the main problem is the transmission loss of the media through which the

scintillator light pass. So over time the signals may decrease at the expense of

an increase in noise levels, compromising the performance of CMS. There are

in particular two cases where radiation damage is important enough to re-

quire a true replacement of damaged detectors before Run 3: one involves the

inner radius of the calorimeter forward (HF) which receives enormous doses

that could reduce the transmission of photomultipliers windows; the other

regards the inner layer of the pixel detector in the barrel [32]. Moreover, to

face the problem, GEM (Gas Electron Multiplier) detectors technology was

44

used, in particular of triples GEM, which will be installed in muon stations,

completing the existing CSC ME1/1 and ME2/1, since these detectors have

thin pro�le and the ability of operating well at particle �uxes far above those

expected in the forward region under HL-LHC conditions. In the past, with

other experiments, it has been shown that GEMs are robust and reliable,

with high longevity and high rate capability[10].

In the forward region the background particle rates are higher and magnetic

bending power is much reduced (see �gure 2.8), so the additional forward

muon detectors will increase the average number of muon hits along a for-

ward track up to about the same level that is already present in the barrel

muon region of CMS.

45

Chapter 3

Muon Reconstruction and

Identi�cation at CMS

The analysis techinque at CMS is characterized by two principal steps: an

"online analysis" conducted by several trigger levels, mentioned in Chapter

2, and an "o�ine analysis", characterized by some algorithms which, using

speci�c criteria of selection, reconstructs and identi�es the realistic particles:

leptons, photons and jets. In particular, I focused the study on the muons,

since the crucial role they could play in the searches for new physics scenarios

with CMS experiment. Therefore the reconstruction and the identi�cation

of the muons become much considerable.

In this chapter in particular I will dwell on these algorithms in the medium-

high transverse momentum muons, while in the next chapter I will focuse on

the selection techniques in the low transverse momentum muons.

3.1 The o�ine muon reconstruction at CMS

The muon reconstruction is divided into 3 main steps:

o reconstruction of the hits and track segments in the muon system (local

reconstruction);

oo track reconstruction independently in the entire muon system and in

the inner silicon tracker;

o combination of these two informations and interpretation of the can-

didate muon in a description of the overall event, using calorimeter

46

information too.

The local reconstruction starts with the de�nition of hit positions in the DT,

CSC and RPC sub-systems, due to the passage of a muon (or other charged

particles). Hits within each DT and CSC chamber are then matched to

form straight-line track segments (track stubs). They are then collected and

matched to generate seeds that are used as a starting point for the track �t

of DT, CSC and RPC hits. The result is a reconstructed track in the muon

spectrometer, called Stand-alone muon, then matched with tracker tracks in

the inner tracker to generate a total muon tracks, featuring the full CMS

resolution. So the high-level muon physics objects are reconstructed in a

multi-faceted way, with the �nal collection being comprised of three di�er-

ent muon types: Stand-alone, Global and Tracker muons (see better Section

3.1.2). Finally information of the energy deposits in the calorimeters and

tracker tracks are combined together in the "isolation" observable (see sec-

tion 3.3) for the three di�erent muon collections.

In the �gure 3.1 it is reported an event example in which 4 muons are recon-

structed, involving all the main sub-detectors of CMS[28].

3.1.1 Reconstruction of the hits and the segments

Local reconstruction, as already mentioned, uses information from individual

chambers (DT, CSC or RPC) to specify the passage of a muon through the

chamber. The precise location of each hit is reconstructed, starting from the

electrical signal read by the electronics, using di�erent algorithms depending

on the detector technology used.

In a DT cell, the hit reconstruction speci�es the transverse distance between

the wire and the intersection, in the cell volume, of the muon trajectory with

the plane containing the wire in that speci�c detection layer. This algorithm

exploits the knowledge of a constant drift rate; the electrons produced by the

gas ionization when a muon crosses the cell are collected at the anode wire.

A time-to-digital converter (TDC) records its arrival time, indicated with

TTDC . This time is then corrected by the pedestal time, Tped, and multiplied

by the electron drift speed v, to reconstruct the position of the hit in DT,

according to the relation:

position = (TTDC − Tped)× v

47

Figure 3.1: Longitudinal (a) - R-z plane - and transversal view (b) - R-φ

plane - of a collision event in which 4 muons are reconstructed. The thin

green lines in the inner cylinder are the reconstructed tracks in the inner

tracker, with pT > 1 GeV/c; those lines that extend to the muon system are

the reconstructed tracks, using both hit in the inner tracker and in the

muon system. Three muons are identi�ed by DT and RPC chambers, while

the fourth by CSC. The small black stubs in the muon system show �tted

muon-track segments, while the horizontal ones in red (a) indicate positions

of RPC hits. The energy deposits in the calorimeters are also show: red

bars for ECAL, blue bars for HCAL [28].

where Tped is responsible for the time starting from the bunch crossing until

the arrival of the trigger decision to the readout electronics. It includes the

time of �ight (at the speed of light) along a straight line from the region

of interaction at the center of the wire, the mean propagation time of the

signal along the wire, the generation of primitive triggers and the processing

by L1. It also includes a component, wire by wire, which takes into account

the di�erent signal paths within the chamber.

Figure 3.2 shows a TTDC distribution measured for one super-layer.

The segment reconstruction works independently on the R-φ and R-z projec-

tions and only at the end of the procedure they are combined so as to have

a 3D segment information. Combining the hits of R-φ SL in a 2D segment,

the resolution reached in the position is about 70µm. Further details can be

found in Ref. [25].

In a CSC layer the hit reconstruction measures the position of the pass-

48

Figure 3.2: Distribution of the signal arrival times, recorded by the TDC,

for all the cells of a single super-layer in a chamber. The continuous line

indicates the �t of the TTDC rising edge to the integral of a Gaussian

function [37].

ing muon, from the combination of the informations from the cathodic strips

and from the anodic wires. As already mentioned, the wires in the CSCs

are orthogonal to the strips and are grouped in bunches of wires. A hit is

therefore reconstructed at the intersection point between the hit strips and

the wire groups.

The charge distribution due to the passage of a single charged particle is

typically distributed over 3 or 5 strips. The simple approach is to obtain the

pulse height in each strip and then cluster the neighboring strips to deter-

mine the probable incidence position of the muon. Each of the 6 layers of

the chamber is considered independently. A 2D hit is constructed at every

intersection of a 3 strip cluster and a group of wires, from the local values

of x and y, with uncertainties calculated from the wire resolution (w/√

12,

where w is the amplitude of the group of wires). On a subsequent stage, the

hits in the 6 layers of the chamber are �tted to form a track segment.

Typical resolutions per layer are about 50 µm for ME1/1 and between 100

and 150 µm for the rest, as pictured in �gure 3.3.

While the CSC and DT chambers are multi-layer systems that allow to build

track segments, the RPC are single-layer. Since the ionization charge from a

muon can be shared by more than one strip, the adjacent strips are grouped

49

Figure 3.3: CSC spatial resolution per layer as function of chamber type

[38].

together to form a cluster, where a hit is reconstructed as the center of grav-

ity of the strip cluster. In the barrel, where the strips are rectangular, this

point is simply the center of the rectangle. In the endcap, the calculation is

more complicated and the assumption is that every group of strips is "on"

as the result of a single particle that crosses the plane of the chamber and

that this crossing can take place anywhere with �at probability on the area

covered by the strip cluster.

The hit reconstruction e�ciency is calculated as the ratio of the number of

reconstructed hits divided by the number of expected hits. With Run 2, a

hit reconstruction e�ciency in the range 94-98% (see �gure 3.4) and a muon

segment reconstruction of 97% was achieved (see �gure 3.5).

3.1.2 Track reconstruction

In �gure 3.6, a CMS slice is shown to sketch the underlying logic of the 3

algorithms of muon reconstruction in CMS.

The tracker tracks are reconstructed by a part of CMS general tracking

with Kalman-�lter tecnique[39]; it uses an iterative approach, executing a

sequence of tracking algorithms, each with a slightly di�erent logic. After

each iteration step, when the �lter incorporates a new measurement, the

track parameters are recalculated, i.e the hits that have been associated with

50

(a) (b)

(c)

Figure 3.4: Hit reconstruction e�ciency measured with the 2018 data in

DT (a), RPC barrel (b) and RPC endcap (c) chambers [38].

a reconstructed track are removed from the input set to be used in the next

step. It is assumed that the system is linear, i.e the track model between

two measurement plans is linear in the parameters. This approach maintains

high performance and reduces processing time.

The stand-alone tracks are built starting from groups of track segments

locally reconstructed in all the sub-detectors of the muon system, along a

muon track, without any constraints to the interaction point, by means of

the Kalman �lter technique, using all DT, CSC and RPC information.

Tracks are build by combining the information from the two track types

in the sub-detectors. Two di�erent approaches are used: Global muon and

Tracker muon.

• Global Muon reconstruction: the tracks are built "outside-in", as

51

Figure 3.5: The e�ciency (in percent) of each CSC in the CMS endcap

muon detector to provide a locally reconstructed track segment as

measured from 2017 data.

Figure 3.6: Slice of CMS and main reconstuction algorithms: the green

circle indicates the Tracker track from inner tracker; red one the

Standalone-muon track from muon system. A di�erent combination of these

two methods is performed by Global muon and Tracker muon algorithms.

the algorithm starts from a reconstructed muon as standalone and ex-

trapolates its trajectory from the innermost muon station to the ex-

52

ternal surface of the tracker, taking into account loss of energy in the

material and multiple scattering[25]. Each standalone track can be

compared with the tracker track and eventually matched with the best

one. This is possible by comparing the parameters of two tracks that

have propagated on a common surface, in a certain region of interest.

The determination of this region is based on the track parameters and

on the uncertainties of the corresponding extrapolated track, obtained

with the assumption that the muon comes from the interaction point.

The choice of the region of interest has a great impact on e�ciency:

well-measured muons are reconstructed faster and more e�ciently than

those poorly measured. Within the region of interest, the initial can-

didates for the muon trajectory are built from pairs of reconstructed

hits, each hit coming from two di�erent tracker layers: all possible

combinations of pixels and silicon strips are exploited to achieve higher

e�ciencies. Starting from the hits, the tracks are reconstructed using

the Kalman-�lter technique, based on standalone and tracker tracks

informations.

• Tracker Muon reconstruction: the tracks are here built "inside-

out", i.e starting from the tracker tracks and assigning to each of them

a compatibility value with a muon in the muon system (including those

tracks not associated with a standalone track in the muon detector).

Each tracker track with pT > 0.5 GeV/c and total momentum p > 2.5

GeV/c is considered a possible muon candidate and is extrapolated to

the muon system, taking into account the magnetic �eld, the expected

average energy losses and the Coulomb multiple scattering in the de-

tector material. If at least one muon segment (from DT and CSC hits)

is matched to the extrapolated track, the corresponding tracker track

quali�es as a Tracker muon.

The track-segment matching is represented in a coordinate system (the

chamber), in which the local x is the coordinate measured in the R-z

plane and the local y is the one orthogonal to it. The segment and the

extrapolated track are considered coupled if the distance between them

in the local x coordinate is less than 3 cm or if the value of the pull1

for the local x is less than 4 [28].

1The pull is de�ned as the di�erence between the position of the matched segment and

that of the extrapolated track divided by their combined uncertainties

53

The performance of the reconstruction algorithms are reported in Appendix

B.

3.2 Muon identi�cation at CMS

Particles detected as muons are produced in pp collision from di�erent sources

which lead to di�erent experimental signatures.

o Prompt muons: muons arising either from decays of W, Z and promptly

produced quarkonia states, or other sources such as Drell-Yan processes

or top quark production.

oo Muons from heavy �avour: muons produced from the decay of a

beauty or charmed hadron or a τ lepton.

o Muons from light �avour: muons from a decay in �ight of light

hadrons (π and K) or, less frequently, from the decay of particles pro-

duced in nuclear interactions in the detector material.

o Hadron punch-through: in this class the most of muon chamber

hits were produced by a particle that was not a muon. The so called

�punch-through� (i.e. hadron shower remnants penetrating through the

calorimeters and reaching the muon system) is the most common source

of these candidates although �sail-through� (i.e. particles not undergo-

ing nuclear interactions upstream of the muon system) is present as

well.

Physics analyses can set the desired balance between identi�cation e�ciency

and purity by applying a selection based on various muon identi�cation vari-

ables, muon ID. Such muon IDs are based on reconstruction-related vari-

ables, such as the track �t χ2, the number of hits per track (either in the

inner tracker or in the muon system or in both), the degree of matching

(value between 0 and 1) between the tracker track and standalone track (for

"global" muons). An algorithm for searching for the "kink" (a sudden change

in the direction of the track, interpreted as a charged particle that decays

into a neutral and a charge particle) breaks the tracker track into two sep-

arate tracks at di�erent points along the trajectory. For each division, the

algorithm performs a comparison between the two separate tracks and a high

value of χ2 indicates that the two tracks are incompatible with being a single

54

track. Moreover, some muon IDs exploit external inputs to the track of the

reconstructed muon, such as compatibility with the primary vertex.

The main muon identi�cation algorithms used in CMS physics analysis in-

clude:

• Loose muon ID : a "loose" muon is a muon selected by the Particle

Flow algorithm2 (PF). Moreover it is required that this muon is either

a tracker muon or a global muon. The Loose ID aims to identify the

muons from the decays of heavy and light hadrons, as well as maintain

a low rate of the misidenti�cation of charged hadrons as muons. The

Loose ID allows also to identify a prompt muon, by complementing

previous criteria with requirements on the impact parameter.

• Tight muon ID : a "tight" muon is a "loose" track with a tracker track

that uses hits coming from at least 6 layers of the inner tracker, includ-

ing at least one hit in the pixel detector. The muon candidate must be

reconstructed as both a tracker muon and a global muon. The tracker

muon must have segments matching in at least 2 muon stations, while

the global muon �t must have χ2/dof < 10 and at least one hit in a

muon chamber. A tight muon must be compatible with the primary

vertex, having transverse impact parameter < 0.2 cm and a longitudi-

nal impact parameter < 0.5 cm.

With this selection, the rate of muons from �ight decays is punch-

through are signi�cantly reduced, at the price of a loss in e�ciency of

a few percentages per prompt muon [28].

• Soft muon ID : it is optimized for low pT (< 100 GeV/c) muons for B

and quarkonia physics analyses. A "soft" muon is a tracker muon with

a tracker track that uses hits from at least 6 layers of inner tracker,

including at least one hit in the pixel detector. This selection requires

a muon segment to be matched in the x and y coordinates to the

extrapolated tracker track, so that the pull is actually less than 3.

A soft muon is loosely compatible with the primary vertex, having a

distance in trasverse plane |dxy| < 0.3 cm and |dz| < 20 cm[34].

2It uses the best combination of all CMS sub-detectors to measure the properties of

individual particles and identify them as electrons, hadrons and muons. In particular,

the muons are reconstructed using the information from the tracker and the muon system

to create a tracker-quali�ed muon from the inner tracker and requesting matching to the

muon segments to the global muons, requiring a global track in both sub-detectors. The

tracks identi�ed as muons are indicated in the next step of the reconstruction.

55

• High-pT muon ID : it is optimized for muons with pT > 200 GeV/c. This

object is reconstructed both as a tracker and a global. The requests

on the tracker track, on the tracker muon and on the transverse and

longitudinal impact parameters are the same as for the "tight" muon, as

well as the demand that there be at least one hit from the muon system

for athe global muon. However, unlike the "tight" ID, the request

on the χ2/dof of the global �t is removed. This remodeling avoids

ine�ciencies at high pT , when the muons radiate wide electromagnetic

shower as they pass through the return yoke of the magnetic �eld,

giving rise to additional hits in muon chambers.

3.3 Muon isolation

Since muons produced by Z, W and τ decays are expected to be isolated

in the detector, while leptons from heavy �avor decays and π - K decays in

�ight are expected to be inside jets, a powerful tool to distinguish them is

the Muon Isolation. Figure 3.7 shows for example the performance of the

Tight isolation algorithm as function of the transverse momentum.

Figure 3.7: Tight isolation e�ciency vs pT [40].

The Muon Isolation is usually calculated using detector information from

56

the tracker and the calorimeters, by means two di�erent algorithms: Tracker

relative isolation, a track-based isolation, and Particle-�ow relative isolation,

a particle-based isolation.

Tracker relative isolation: it calculates the scalar sum of all pT tracker tracks

reconstructed in a cone of radius ∆R ≡√

(∆ϕ)2 + (∆η)2, centred on the

muon track direction, ∆η and ∆φ being the distances in pseudorapidity and

azimutal angle between the deposit (sum of transverse momenta of recon-

tructed track) and the cone axis, respectively. (see �gure 3.8).

Figure 3.8: Isolation cone with axis which is direction of the muon at the

vertex[25].

The HCAL and ECAL energy deposit in the cone is computed and the muon

contribution to the energy measurement inside the cone is removed, by ex-

cluding the small area around the muon (the �veto value�) from the cone.

Comparison of the deposit in the cone with a prede�ned threshold deter-

mines the muon isolation. This alghorithm has high e�ciency and small

dependence on pile-up [41].

Particle-�ow relative isolation uses charged and neutral particles from particle-

�ow (PF) algorithm: the pT of charged particles originating from the primary

vertex are summed together with the total energy of neutral particles in the

same cone. The contribution from pile-up to the neutral particles is corrected

by computing the sum of charged hadron deposits originating from pile-up

vertices, scaling it by a factor of 0.5, and subtracting this from the neutral

57

hadron and photon sums to give the corrected energy sum from neutral par-

ticles. The factor of 0.5 is estimated from simulations to be approximately

the ratio of neutral particle to charged hadron production in inelastic proton-

proton collisions[34].

The values for the tight and loose working points for PF isolation within

∆R < 0.4 are 0.15 and 0.25, respectively, while the values for track based

isolation within ∆R < 0.3 are 0.05 and 0.10 [34].

In conclusion, the muon reconstruction creates muon objects, the identi�ca-

tion helps solving potential ambiguities at the analysis level, and "isolation"

distinguishes muons from jets.

3.4 Performance of Muon Identi�cation and Iso-

lation Algorithms

The performance for muons ID/isolation algorithm is studied with the "tag-

and-probe" method, starting from a tracker track as "probe". This technique

consists in selecting particles of the desired type (muons or electrons) coming

from the decay of known mass di-object resonances, like Z, Y or J/Ψ. The

�tag� is an object that passes a set of very "tight" selection criteria designed

to isolate the required particle type. The "probes" are selected by pairing

the tags such that the invariant mass of the combination is consistent with

the mass of the resonance. The probes are used to measure the e�ciency of

a particular selection criterion.

In �gure 3.9 is reported the e�ciency of Loose and Tight identi�cation algo-

rithms, for muons with pT > 20 GeV, measured in Z → µµ events.

The slight improvement in the 2018 performance with respect to 2017 is due

to the recovery of some CSC chambers.

The performances of the Loose and Tight working points are consistent

within the di�erent years.

The drops visible in Tight ID at around |η| = 0.2 are due to the gap between

two wheels in the muon system.

In �gure 3.10 is shown the Loose tracker isolation and tight Particle-�ow

isolation, as function of η, for muons with pT > 20 GeV. In both cases there

is a rather agreement between data and simulation.

58

(a) (b)

Figure 3.9: Tag-and-probe e�ciency for identi�cation in 2016-2017-2018

data for loose (a) and tight (b) muons, with pT > 20 GeV[41].

Figure 3.10: Loose tracker (left) and Tight PF (right) iso e�ciency in 2017

data, for muons with pT > 20 GeV [41].

The performances shown so far are obtained for pT > 20 GeV, in Z→ µµ

events. In �gure 3.11 are reported instead the performance for the Soft

muon ID, measured in the J/Ψ→ µµ events, for muon pT > 3 GeV.

59

Figure 3.11: Low-ppT Soft ID e�ciency vs η, using Tag-and-Probe method.

The Soft muon ID is commonly used in analysis involving low pT muon in

the �nal state and exhibit high e�ciency over the whole CMS acceptance.

However a complementary information in this context concerns the e�ciency

on background, that will be investigated in next chapter.

60

Chapter 4

Study of a low-pT muons

identi�cation algorithm with

MVA techniques

4.1 Introduction

Muon reconstruction and identi�cation algorithms in the CMS experiment

shows excellent performance for medium pT (5-10 GeV), intermedium pT (10-

200 GeV) and high pT (> 200 GeV). The purpose of my thesis was to extend

these muon identi�cation algorithms to the low-pT region (< 3-4 GeV), for

possible future searches of new physics at CMS. As discussed in Section 1.6,

many interesting new physics scenarios involve low-pT muons in the �nal

states, e.g. muons coming from the decay of heavier leptons with lepton �a-

vor violation or from the decay of light bosons foreseen by SUSY theories.

A dedicated muon identi�cation algorithm can be particularly useful in cases

where muons are characterized by soft pT spectrum, since these muons are

more likely to not be fully reconstructed, because the entire muon identi�ca-

tion in CMS, from detection up to reconstruction, has been optimized mainly

for muons coming from heavy bosons decay (W, Z, Higgs). The loosening of

the reconstruction requirements, needed for the low-pT muon tagging, leads

to increased probability of having a low-pT hadron incorrectly reconstructed

as a muon (fake rate). In this context, as it will be shown in section 4.4, two

main sources of background can be identi�ed:

� muons from K and π decays in �ight;

61

�� K, π and p produced with non-negligible relativistic boost reaching the

muon system and thus misidenti�ed as muons.

The former are real muons that can be distinguished from the prompt muons

just by the study of the secondary vertex and its related variables measured

in the tracker. The latter instead must be distinguished using calorimetric

and muon-system related observables and are the main focus of the work

developed in this thesis.

A feasibility study to discriminate signal muons from the backgound ones has

been carried out using the DS → νττ , τ → 3µ as a physics case that exploits

the Multi-Variate Analysis (MVA) and Machine Learning (ML) techniques

(described in the appendix A). The study has been performed in four phases:

1. �rstly I studied the performance of the muon identi�cation algorithms

commonly used in CMS, using dedicated muon-enriched datasets;

2. I investigated the background composition in searches involving low-pTmuons;

3. then I developed a new algorithm based on Machine Learning tech-

niques, combining several variables related to the muon reconstruction;

4. �nally, I studied the performance of the new algorithm in dedicated

control samples, independent from the ones used for the training.

In this chapter the full study is presented, exploiting the Toolkit for Mul-

tivariate Data Analysis (TMVA, see appendix A) with ROOT [42], a data

analysis framework largely used in high energy physics.

4.2 Monte Carlo simulation in CMS

To build a model that could well discriminate the signal muons from the

background ones, I used samples generated with Monte Carlo (MC) tech-

nique.

The MC events are generated with a computer simulation program for high-

energy collisions, called PYTHIA [43], that includes an implementation of the

theoretical models underlying the proton-proton collisions and the emerging

62

particles. The particles emerging from the primary interactions and their de-

cays are stored as GenParticles, a data-format that contains the type of the

particle: this information is saved in accordance to the PDG Monte-Carlo

numbering scheme [4] and is indicated as PdgId variable, for the particle in

exam, or with the pre�x "Mother", referring to the parent of the particle in

exam. A list of the PdgId numbers most used in this study are reported in

table 4.1.

Table 4.1: Main PdgId numbers used in this thesis.

PdgID(µ ) 13

PdgID(τ) 15

PdgID(K) 321

PdgID(π) 211

PdgID(p) 2212

The GenParticles are then processed through a simulated detector based

on GEANT4 [44], a toolkit that simulates how particles propagate through

space, interact with the detector material and loose energy. The resulting

data is stored in the form of "simulated hits" (simHit), which contain the

information about the part of detector in which the hit has been generated,

the particle type, the process in which the particle that caused the hit was

produced, the energy deposited in the detector unit, the entry point in the

local coordinate system, the momentum at entry and the time of �ight from

the primary vertex. The simHits are then used to build the related tracks

(simTracks).

Starting from the positions of the hits and from the simulated energy losses

in the simHit of a passing particle, a digitization phase follows, which sim-

ulates the response of the readout electronics of the detectors, which must

be as close as possible to the resulting one from real data: the simHit(s) are

converted into digi(s). The so called pile-up interactions (see section 2.3),

created by multiple proton-proton collisions happening in the same bunch

crossing, are then superimposed to the hard-scattering interaction, as they

are faster than the detector simulation and therefore require to be treated

individually.

From the digi, the reconstructed hits in each sub-detector are derived (re-

cHits), that contain the information about the energy and the position for

a single detector elements. Sophisticated algorithms are then run on top of

63

the recHits to build the higher lever objects, such tracks, muons, electrons,

photons and jets.

The reconstructed muons, used in this study, are built with the Tracker or

Global algorithms, decribed in Section 3.1.2.

4.2.1 Matching generated to reconstructed particles

To facilitate interfacing between event generators, detector simulators and

analysis packages used in particle physics, a set of Monte Carlo truth in-

formation is stored in the reconstructed object, that keeps track of the cor-

rensponding generated and simulated object. In particular an association

between reconstructed muon and the simulated muon track is performed hit

by hit both in tracker and muon detectors. The quality of the association is

then given by the degree of hit sharing between the reconstructed and the

simulated track. The reconstructed muon with the best matching quality is

then chosen to be associated to a given simulated track. Since at the sim-

track level the particle identity and generator-level kinematics are known,

those information are also kept in the reconstructed object.

4.3 Signal description: Ds → τ+ντ → 3µ+ντ

The MC simulated sample used to study the signal muons is centrally pro-

duced by CMS, with the conditions of the 2017 pp collisions and data-taking.

The details of this dataset are reported in Table 4.2 and in the following I

will refer to this sample as DsTau3mu sample.

Table 4.2: Signal MC dataset.

Dataset name Number of events

DsToTau_To3Mu_MuFilter_TuneCUEP8M1_13TeV − pythia8/RunII−Fall17DRPremix− PU2017_94X_mc2017_realistic_v11− v1/AODSIM

3.6 x 106

The LFV τ → 3µ decay has been chosen as a physics case here, though the

results achieved in this context could be easily extended to other �nal state

topologies with the same kinematics.

The dataset is generated using PYTHIA 8, with a Minimum Bias con�gu-

ration (i.e. generic inelastic pp collisions with all the associated emerging

64

tracks), and the decay chain is programmed using the EVTGEN [45] as fol-

lowing: if a DS meson is found after hadronization, it is forced to decay to a

τ , which is subsequently set to decay to 3µ. The τ → 3µ process is assummed

to be 3-body phase-space decay.

Only generated events containing at least two generated muons within the

CMS acceptance, are then processed with time consuming full simulation.

In �gure 4.1, the generator-level pT and η distribution of the muons coming

from the τ decay are reported, where the leading muon is one with the high-

est pT (green distribution) and trailing muon is the one with lowest pT (blue

distribution) among the three muons.

(a) (b)

Figure 4.1: η (a) and pT (b) distribution of the 3 muons from τ decay, at

generator level, at CMS acceptance, where the leading muon is one with the

highest pT (green distribution) and trailing muon is the one with lowest pT(blue distribution) among the three muons.

The bulk of the pT spectrum of the signal muons lies below 10 GeV. In par-

ticular the lowest pT muon, the trailing one, peaks around 2 GeV. In the

context of this search, it is crucial to preserve the highest detection e�ciency

on all the three muons. Loosing one of them due to an incorrect identi�ca-

tion would translate in a reduction of the signal acceptance and thus in a

reduction of the sensitivity of the search of such a rare process (see section

1.6.1).

65

4.4 Study of the background composition

The DS → τντ → 3µντ sample has been created using the realistic conditions

of the pp collision of the 2017 data-taking. This means that each event of

interest from the primary interactions has been superimposed to the pile-up

interactions, that are a rich source of background muons.

In the following, muons from the τ decay will be referred as "signal" muons.

Reconstructed muons not matching at gen-level with the muons from τ decay

are referred to as "background". Background muons have been divided into

four categories, depending on the MC truth information:

1. muons (PdgId=13) from K (Mother_PdgId=321) decay: muons pro-

duced by a true muon from a decay in �ight of kaons;

2. muons (PdgId=13) from π (Mother_PdgId=211) decay: muons pro-

duced by a true muon from a decay in �ight of pions;

3. muons (PdgId=13) from others decays: true muon produced from par-

ticles other than kaons and pions decays. Most of such muons are

produced in D and B mesons decay. A full list of particles decaying

into muons together with the occurrence in the MC sample is reported

in table 4.3.

4. non-muons (PdgId 6=13): the genParticle matching with these wrongly

reconstructed muons are mainly pions, protons and "unknown" (PdgId

= 0), when the reconstructed muon cannot be associated to any generator-

level particle. As discussed in section 4.1, this happen when either the

muon track is too soft and the generator level information is not saved

or when the number of hits associated with any simulated track is not

su�cient. A full list together with the occurrence in the MC sample is

reported in table 4.4.

The most relevant contribution to background in the MC signal sample comes

from the fourth category, containing hadrons that reach the muon system.

The probability for an hadron to reach the muon system is very small, be-

cause of the higher energy loss due to the nuclear interactions with the CMS

material budget. However it comes out to be not negligible with respect to

the signal as it will be discussed in the next section. Such probability has

been computed as the ratio between the number of kaons, pions and protons

66

Table 4.3: List of percentage fractions of the gen-level particles decaying

into muons found in the DS → τντ → 3µντ MC sample.

Table 4.4: List of percentage fractions of the gen-level particles with

PdgId 6=13 associated to the reconstructed muon as explained in section 4.1,

with respective errors, in the DS → τντ → 3µντ MC sample.

that are tagged as reconstructed muons and the total number of simulated

tracks associated to kaons, pions and protons that do not decay before reach-

ing the muon system. The results are reported in table 4.5.

In order to compare the di�erent categories of background, I studied the

67

Table 4.5: Probability for an hadron to be tagged as a muon.

Particle Probability

Pions 4 x 10−3

Kaons 2 x 10−3

Protons 4 x 10−4

distributions of the kinematic variables, shown in �gure 4.2.

(a) (b)

Figure 4.2: η distribution (a) and pT distribution (b) of the reconstructed

muons, distinguished in the 4 categories of the background.

The distributions show a signi�cant background contamination in the low pTrange, where a large part of the signal muons lies. As already mentioned, the

most aboundant contribution comes from other particles erroneously identi-

�ed as muons (non-muon). In �gure 4.3 the pT distribution of each category

of non-muons (as shown in table 4.4).

In the search of a very rare process, the important point is to maximize

the signal acceptance, while keeping the background level at the order of few

percent. In the speci�c contest of the τ −→ 3µ search, this means that muon

identi�cation algorithms should garantee this performance even on the trail-

ing muon to preserve the signal acceptance. In the present work I developed

a dedicated muon ID using Machine Learning techniques, targeting the low

pT muons, focusing on the non-muon category, being more abundant.

68

Figure 4.3: pT distribution of the 4 categories of the non-muons. The

histograms are normalized to their area.

4.5 Performance of the standard muon ID al-

gorithms

The �gures of merit used to quantify the performance of the muon ID algo-

rithms are e�ciency and fake rate.

The e�ciency is de�ned as the ratio among the number of the reconstructed

muons associated to a true muon, at generator level, that ful�ll a given ID

algorithm and the total number of associated muons:

εsig =#recoMuon associated to a true muon that pass ID

#recoMuon associated to a true muon

The computation of this quantity was performed using the muons coming

from τ → 3µ decay for three standard ID algorithms: Soft, Loose and Tight

ID algorithms (see �gure 4.4).

The Soft and Loose muon ID algorithms show a close to 100% e�ciency,

while Tight ID is ine�cient, due to the strict requirements on the vertex.

The fake rate is de�ned as the ratio between the number of reconstructed

muons candidates not associated to a true muon, at generator level, that

ful�ll the criteria imposed by a given identi�cation algorithm and the total

number of the not associated muons:

fake rate =#recoMuon not associated to a true muon that pass ID

#recoMuon not associated to a true muon

69

(a) (b)

Figure 4.4: Soft, Loose and Tight e�ciency vs η (a) and pT (b), for muons

from τ → 3µ decay.

It quanti�es the e�ciency of a given muon identi�cation algorithm on non-

muon background, or, in other words, the false positive rate. The fake rate

for non-muon has been evaluated for the Soft, Loose and Tight ID and it is

reported in �gure 4.5.

(a) (b)

Figure 4.5: Soft, Loose and Tight fake rates vs η (a) and pT (b).

It can be noticed that the fake rate is on avarage of 50% for Soft ID and

on average of 40% for Loose ID. The Tight ID shows a very small fake rate

70

(about 5%), but on the other hand it is characterized by a small e�ciency

on the signal.

In addition I de�ned as εK,π→µµ the probability of muons coming from K

and π decay to pass the Soft, Loose and Tight ID criteria and the computed

values are reported in �gure 4.6.

(a) (b)

Figure 4.6: Soft, Loose and Tight fake rates vs η (a) and pT (b).

The probability for an hadron to be full identi�ed as a muon is given by

the convolution of the probability for hadrons to reach the muon system (ta-

ble 4.5) and the muon ID fake rates showed in �gure 4.5.

It is important to underline that in the search for very rare processes, such

as τ → 3µ, the background coming from the standard model processes is

O(107) (see section 1.6.1) with respect to the expected signal. The aim of

this study is to signi�cantly reduce the overall fake rate by implementing a

new muon ID algorithm.

4.6 Implementation of the MVA discriminator

As discussed in the section 4.1, the searches of new particles or LFV decays

involving low momentum muons in the �nal state, have to deal with back-

ground that is tipically orders of magnitude higher than the signal. There-

fore, it is crucial to identify the �nal state muons in the best possible way.

71

This means that the algorithm dedicated to the muon identi�cation should

be characterized by:

• high e�ciency, closer to 100%;

•• negligible fake rate.

As shown in the previous paragraph, an irreducible source of background for

this searches is given by the semi-leptonic decay of heavy and light �avor

mesons (K, π, but also D and B mesons): this background can be removed

by topological requirements, applied at the analysis level, and is not studied

in the present work. On the other end, a relevant contribution to the back-

ground comes from the light �avor mesons erroneously identi�ed as muons.

The standard muon identi�cation algorithms are not optimized for the ultra-

soft pT range (< 3-4 GeV), the fake rate being ∼50% for the Soft ID and

40% for the Loose ID (see �gure 4.5.

The idea is to develop a multivariate discriminator, combining several input

variables describing the quality of the muon reconstruction, the energetic

deposits in the calorimeters and the timing information, to distinguish low

momentum muons from pions and kaons, with very high e�ciency and a few

percent fake rate.

This is a binary classi�cation problem, that will be treated with the Ma-

chine Learning (ML) techniques, by training proper discriminators. They

are more sophisticated techniques, which use input information from multi-

ple variables from various sources (called features), automatizing Multivari-

ate Analysis (MVA), train a data sample, in order to make predictions on

unknown datasets, and learn how to classify new datasets: they provide a

"response" that is the result of the discrimination between the event of in-

terest (de�ned as signal) and the background.

There are many possible classi�cation techniques or classi�ers : the two used

in this analysis are the Boosted Decision Tree (BDT) and the Multilayer

Perceptron (MLP). See appendix A).

Boosted Decision Tree

The BDT is a binary tree classi�er with a central node, from which a sequence

of splits unfold. Each split uses the variable that at this node gives the best

separation between signal and background when being cut on. Since a cut

that selects predominantly background is as important as one that selects

72

signal, the criterion is symmetric with respect to the event classes. The

node splitting stops once it has reached the minimum number of events, the

maximum number of nodes and the maximum depth. The leaf nodes at the

bottom end of the tree are labeled signal and background, depending on the

majority of events that end up in the respective nodes [46].

The boosting of a decision tree (a recursive algorithm applied on reweighted

(boosted) versions of the training data) extends this concept to several trees,

forming a forest: the trees are derived from the same training ensemble by

reweighting events, and are �nally combined into a single classi�er which is

given by a (weighted) average of the individual decision trees. The boosting

used in this study is the AdaBoost (see appendix A).

Multilayer Perceptron

The MLP is a type of arti�cial neural network, throught which, by applying

an external signal to some binary on/o� inputs, is put into a de�ned state

that can be measured from the response of one or more binary outputs.

The basic unit of computation is a neuron, called Linear Threshold Unit

(LTU), in which each input connection is associated with a weight, which

tells the neuron to respond more to one input and less to another. The LTU

computes a weighted sum of its inputs and gives as output a non-linear func-

tion, called the Activation Function, that takes a single number and performs

a certain �xed mathematical operation on it: if the result exceeds a thresh-

old, it outputs the positive class or else the negative one.

The Perceptron is composed by a single layer of LTUs with each neuron con-

nected to all the inputs, plus a "bias neuron", which just puts 1 all the time,

(input layer) and by a single layer of LTUs that provide a response (output

layer). For every output neuron that produced a wrong prediction, it rein-

forces the connection weights from the inputs that would have contributed

to the connect prediction.

A Multilayer Perceptron borns to improve the Perceptron performance. It is

composed of input layer, one or more layers of LTUs, called hidden layers,

and one output layer.

The behaviour of the network is determined by the layout of the neurons, the

weights of the inter-neuron connections and by the response of the neurons

to the input, described by the neuron response function ρ, that maps the

neuron input onto the neuron output.

A MLP learns by means of an iterative algorithm, which computes the output

73

of every neuron in the net and measures the network's output error, i.e. the

di�erence between the desired output and the actual output of the network.

Its purpose is to minimizing this error, acting on the choice of the weights.

This algorithm is commonly called Back-propagation (BP): for each training

instance, the algorithm feeds it to the net and computes the output of every

neuron in each consecutive layer (this is the forward pass). Then it measures

the network's output error and computes how much each neuron in the last

hidden layer contributed to each output neuron's error. Then it proceeds to

measure how much of these error contributions came from each neuron in the

previous hidden layer and so on, until the algorithm reaches the input layer.

This reverse pass e�ciently measures the error gradient across all the con-

nection weights in the network by propagating the error gradient backward

in the network [47]. Mathematical details are in appendix A.

4.6.1 Signal and Background de�nition

In order to train the discriminators in a binary classi�cation problem one has

to provide background and signal samples with large statisics of events.

The signal in this context are the muons from τ → 3µ decay, taken from the

MC dataset, detailed in Section 4.3.

From the results shown in Section 4.4 is clear that main source of back-

ground are kaons, pions and protons. In order to provide a su�cient number

of background examples in the training phase, I generated two dedicated MC

datasets that so called "particle guns".

The particle gun samples were produced without pile-up conditions, through

PYTHIA8, for the following particle types:

� pions: π± → X, with 499950 events;

� kaons: K± → X, with 498000 events;

� protons: p/p, with 58440 events;

where X includes all the possible particle decay, according to their BR.

Each event in these samples consisted of two generator particles, one particle

and its corresponding anti-particle.

They are generated with a �at transverse momentum between 0 and 25 GeV

and a �at pseudorapidity between -2.5 and +2.5, as shown in �gure 4.7. In

accordance to the end point of the spectrum reported in �gure 4.3.

In order to cross-check the performance of the muon ID algorithms on these

samples, the fake rates are reported in �gure 4.8, as function of the kine-

matic variables. The e�ciency for muons coming from K or π decays is also

74

(a) (b)

Figure 4.7: η (a) and pT (b) distributions of kaons, pions and protons guns,

at generator level, at CMS acceptance.

(a) (b)

Figure 4.8: Soft, Loose and Tight ID fake rates vs η (a) and pT (b).

computed and the results are in �gure 4.9.

It can be noticed that this results are compatible with the ones showed in

�gure 4.5.

I performed the same study on the standard reconstruction algorithms, Global

muon and Tracker muon, since these variables will then be fed to the dis-

criminator. The results are reported in appendix B.

75

(a) (b)

Figure 4.9: Soft, Loose and Tight ID fake rates.

The following selections are imposed on signal and background:

- for signal sample:

. the reconstructed muon is matched at generator level with a genuine

muon and it comes from the decay of the tau lepton;

.. region of interest: pT < 6 GeV;

. distance in the transverse plane, dxy, for true muons < 2 cm to select

tracks that are compatible with the nominal interaction point.

- for background sample:

. The reconstructed muon does not match with a genuine muon at gen-

erator level;

.. distance in the transverse plane dxy < 2 cm.

Only muon candidates with pT < 6 GeV are considered in this study and

partecipate to the training for both signal and background.

4.6.2 Input Discriminating variables

A set of 24 variables is used to train the discriminator, chosen on the basis of

the discriminating power between genuine and fake muons. All the variables

76

are related to the capability to discriminate a minimum ionizing particle

(mip) with respect to an hadron that undergoes also nuclear interactions.

The variables can be divided into 5 groups:

1. Tracker-track reconstruction related variables

o Muon_trackerLayersWithMeasurement : number of tracker layers �red

by the track of the muon candidate that have been used in the �t of

the tracker track;

oo Muon_innerTrack_normalizedChi2 : χ2 of the inner track �t divided

by the number of degrees of freedom of the �t;

o Muon_Numberofvalidpixelhits : amount of valid hits in the pixel detec-

tor.

2. Muon-system reconstruction related variables

o Muon_isTracker : Tracker muon reconstruction �ag;

oo MuonTrkKink : kink algorithm 1 applied to the global muon's inner

track;

o MuonTrkRelChi2 : sum of χ2 estimates of the hits in the tracker with

respect to the Global muon track.

o Muon_numberOfMatchedStations : number of muon stations contain-

ing segments matched with a tracker track;

o Muon_outerTrack_normalizedChi2 : χ2 of the �t of the muon track as

it is reconstructed in the muon system (standalone muon), divided by

the number of degrees of freedom of the �t;

o Muon_outerTrack_muonStationsWithValidHit : number of stations in

the muon system with valid hits (> 0);

o Muon_segmentCompatibility : compatibility of the track as it is recon-

structed in the tracker with the muon hypothesis (∈ [0, 1]); it evalu-

ates which crossed stations have matching muon segments and assigns

a probability.

1The kink algorithm takes the di�erence in φ of the predicted track position and the

actual recHit, squares it and divides it by the error in φ of the recHit position. The �nal

value corresponds to the sum of these calculated values for all recHits.

77

3. Overall muon reconstruction quality variables

o Muon_isGlobal : Global muon reconstruction �ag;

oo Muon_QInnerOuter : product of the electric charge of the Tracker

track and the electric charge of the standalone muon track;

o Eta_di� = (Muon_outerTrack_eta - Muon_innerTrack_eta): di�er-

ence in η between track position in the muon system and in the tracker;

o Phi_di� = (Muon_outerTrack_phi - Muon_innerTrack_phi): di�er-

ence in Φ between track position in the muon system and in the tracker;

o Muon_combinedQuality_chi2LocalMomentum: a scalar resulting from

the matrix-vector product vTMv, where v corresponds to the di�erence

between the momentum vector of the Standalone muon track and the

Tracker track at the common surface and M corresponds to the related

error matrix;

o MuonChi2LocalPosition: a scalar resulting from the matrix-vector prod-

uct vTMv, where v corresponds to the di�erence in position of the state

of the trajectory on the surface of the �rst muon station for Standalone

muon and the Tracker track and M corresponds to the related error ma-

trix;

o MuonGlbTrackProbability : probability of the χ2 associated to the global

track re�t being larger than the one observed;

o Muon_combinedQuality_glbKink : kink algorithm, calculating the Tracker

kink applied on the Global track. In detail, I used this quantity as Log(2

+ Muon_combinedQuality_glbKink), in order to avoid a large loss of

muons after a pre-selection cut, due to the long tail of the distribution

of this variable;

o Muon_combinedQuality_globalDeltaEtaPhi : squared di�erence in η and

φ of the Standalone muon track and the Tracker track on the common

surface during track matching.

4. Timing information

78

o Muon_timeAtIpInOut : time of arrival (time of �ight) at the muon

system for muons moving inside-out, assuming β = 1;

oo Muon_timeAtIpInOutErr : uncertainty in the time of arrival at the

muon system for muons moving inside-out, assuming β = 1. This is

not the uncertainty of the measurement, but the sum of di�erences

between the measurements and the measurements weighted with their

uncertainties.

5. Calorimetric information

o Muon_calEnergy_had : energy deposits in HCAL, compatible with the

muon track both in the tracker and in the muon system.

In addition, also the η and φ variables are provided to the discriminator, in

order to exploit possible trends and correlations of the muon reconstruction

to a given subdetector or a particular spatial region in CMS.

Figure 4.10 shows the distribution of the main discriminating variables, for

signal (true muons) and background (fake muons) samples after the selections

reported in section 4.6.1: the blue histograms are the background and the

red ones the signal.

The linear correlation between the input variables has been evaluated and

the results can be visualized in �gure 4.11.

The two matrices are quite similar, except that, in the background sample,

the variables trkRelChi2 and innerTrack_normalizedChi2 are high correlated

to the TrkKink. However since this variables are highly discriminating and

the performance of the �nal output are not degraded by the correlation, they

are kept in the training.

The high correlation between Muon_segmentCompatibility and

Muon_numberOfMatchedStations is due to the fact that in the de�nition of

the segment compatibility the matched stations are included.

4.6.3 Algorithms setup and con�guration

The overtraining of a given method is very common in ML and happens when

either the number of features is too large or the model structure is not really

conform to the data shape or the model is trained on limited statistics: in

this case the outcome of the training that corresponds too closely to the set

of data in its input may fails to �t additional new data and reliably predict

79

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Figure 4.10: Distribution of some input discriminating variables,

normalized to their area.

future observations.

One can intuitively understand overtraining from the fact that informa-

tion from all past experience can be divided into two groups: information

that is relevant for the future (or the prediction) and irrelevant information

("noise").

Possible solutions are: to simplify the model by selecting one with fewer pa-

rameters; to gather more training data; to reduce the noise in the training

80

(a)

(b)

Figure 4.11: Correlation matrix among input discriminating variables, for

the signal (a) and for the background (b), provided by TMVA.

data (e.g. �x data errors and remove outliers).

Therefore the con�guration of the BDT and MLP algorithms is chosen in or-

der to avoid the overtraining without losing performance. In tables 4.6 and

4.7 the parameters of MLP and BDT algorithms, as implemented in TMVA

tool, are reported respectively. The default con�guration and the �nal one

chosen after the optimization phase are reported.

In �gure 4.12 the architecture of the MLP network with all the chosen pa-

81

Table 4.6: MLP con�guration: the names of the parameters correspond to

the names used by the TMVA tool.

Parameters Default values Final values Description

NeuronType sigmoid tanh neuron activation function α

VarTransform None Normalisation

lists the variable transforma-

tions performed before train-

ing

NCycles 500 500

number of epochs: the num-

ber of training cycles necessary

to achieve a su�ciently good

training of the network

HiddenLayers 2 HL N, N-1 1 HL N+5

speci�cs the HL architecture:

the number of HL in a network

and the number of neurons in

these layers

TestRate 10 5

number of training cycles be-

tween two test; it is a test for

overtraining performed at each

#th epochs or better indicates

the test frequency

ConvergenceTests -1 1

number of subsequent con-

vergence tests which have to

fail to consider the training

to be completed, required for

convergence (< 0 means au-

tomatic convergence check is

turned o�)

rameters is shown.

4.6.4 Performance of the discriminators: Kolomogorov-

Smirnov test, response and ROC curve

In this section all results of the training phase are reported, comparing the

ROC curve, the output of the algorithms and the considerations about the

best method to be used. The training of the algorithms has been performed

merging the two background guns, pions and kaons, to increase the statistics.

As a general rule, a good way to know if an algorithm will work is to try it

82

Table 4.7: BDT con�guration: the names of the parameters correspond to

the names used by the TMVA tool.

Parameters Default values Final values Description

NTrees 800 1000 number of trees in the forest

MinNodeSize 5% 5%

minimum number of events (in

percentage) required in a leaf

node. The value is relative to

the total event sample size, i.e.

all events used in the trainingMaxDepyh 3 2 maximum depth of a tree

BoostType AdaBoost AdaBoost

boosting algorithm for the

trees in the forest: in order to

make the decision trees robust

against statistical �uctuations

of the training sample, boost-

ing, i.e. reweighting, is applied

to the training sample

AdaBoostBeta 0.5 0.5learning rate η for adaptive

boost algorithms

UseBaggedBoost False True

means that only a randomly

chosen sub-sample of the event

sample is used for boosting

BaggedSampleFraction 0.6 0.6

fraction of the sub-sample size

used for boosting relative to

the event sample size

SeparationType GiniIndex GiniIndexcriterion chosen for node split-

ting

nCuts 20 29

number of cut values in a grid

which is adapted to the vari-

able distribution, using in �nd-

ing optimal cut in node split-

ting. The value -1 invokes an

algorithm that tests all possi-

ble cuts on the training sample

and �nds the best one, but it

is slower than the coarse grid

83

Figure 4.12: The MLP network architecture from TMVA tool.

out on new datasets. The solution I have adopted is to split the data into two

sets: the training set to train model and the test set to test it. Evaluating

the model on these sets it is possible to obtain an estimation of the error of

the generalization of the model and of the goodness of the created discrim-

inator (see better appendix A). Then both signal and background samples

are divided in subsets and in particular 70% for training and 30% for test

resulted to be the optimal choice that ensures a good balance between learn-

ing capability and overtraining.

The shape of the two trained discriminators is shown in �gure 4.13, for signal

and background muons. Test and training are superimposed.

One of the problems that a ML algorithm can encounter, as said in the

previous paragraph, is the overtraining, or more simply a disagreement be-

tween the good performance on the training set and the bad performance on

the test set; in order to assess the compatibility between the discriminators

obtained in the test and training samples, a Kolomogorov-Smirnov (KS) test

is performed, which quanti�es this disagreement and provide a value of prob-

ability between 0 and 1: values close to zero are taken as indicating a small

probability of compatibility.

84

(a)

(b)

Figure 4.13: The MLP network (a) and BDT (b) response: training and

testing outputs are superimposed.

This test should in theory be used only for unbinned data and not for binned

data as in the case of the histogram, for which the test returns a non-uniform

distribution between 0 and 1. However ROOT provides the function to imple-

85

ment this statistical control 2. If by chance the distributions of two samples

under comparison are the same, then the value becomes 1.

The KS test performed on BDT and MLP, gave the following result:

MLP : KS sig (bkg) probability = 1 (0.993).

BDT : KS sig (bkg) probability = 0.999 (0.563).

It can be noticed that the BDT response performance are not optimal for

the background.

The discrimination performances of both the classi�ers are shown in the ROC

curves in �gure 4.14.

Figure 4.14: ROC curves. As a comparison, it is also reported the ROC

curve of the simplest training algorithm: the Linear Discriminant (see

Appendix A).

Since the ultimate goal of this study is to improve the discrimination of real

to fake muons, looking at the score for a given classi�er, a set of possibile

choices for a working point to be used as a selection is provided in table 4.8

for several background level contamination. The �rst one appears to be the

most interesting, since it ensure a fake rate around 1%.

2the values of probability for binned data will be shifted slightly higher than expected,

depending on the e�ects of the binning.

86

Table 4.8: Final performances.

MVA Method Signal e�ciency: from test sample (from training sample)

@B=0.01 @B=0.10 @B=0.30

MLP 0.975 (0.973) 1.000 (1.000) 1.000 (1.000)

BDT 0.978 (0.978) 1.000 (1.000) 1.000 (1.000)

The separation power (SP) is equal to 49.5, computed as the quotient of

signal e�ciency (ε) and fake rate (r), SP = ε/(1-r).

Even if the BDT reaches slightly higher values in e�ciency, at the same time

it shows an irregular pro�le (�gure 4.14), because it su�ers from overtraining,

in particular for the background dataset, as the KS test demonstrated. The

MLP shown much more robust performance and thus has been selected for

this analysis.

4.7 Validation in the control region Ds −→ φπ

and in the Minimum Bias events

In order to validate the performance of the MLP-based low momentum muon

identi�cation, e�ciency and fake rate are computed in dedicated control

regions, completely uncorrelated from the event samples used for the training.

The control region chosen to check the performance of signal muons is pp→DS → φπ, where the muons candidated come from φ → µµ. Indeed the

�nal state muons produced from this φ decay have a very soft pT spectrum

as the muons coming from supersimmetric light boson decays or from LFV

decays of charged leptons. On the other end, the production cross section

and BR involved in the DS → φπ → (2µ)π process are completely known

and well determined. So this process is used as a standard candle to probe

the e�ciency of the new algorithm.

I generated ∼151k events DS → φπ → (2µ)π, with PYTHIA 8, starting

from a Minimum Bias event con�guration and requiring at least one DS

meson per event. The DS → φπ and φ → µµ decays are then forced with

BR = 1. Standard CMS event reconstruction algorithm are then applied to

the �nal state particles as described in Section 4.2. The 2017 realistic data

taking conditions are simulated, e.g the pile up interactions are simulated

according to the distribution shown in �gure 2.14 (b).

87

A set of selections is applied on reconstructed events and variables, in order

to select a good di-muon candidate:

- 2 opposite sign (OS) muons with pT > 0.5 GeV and |η| < 2.4;

-- invariant mass of the di-muon system is required to fall in the φ mass

window 1 < mµµ < 1.04 GeV; while invariant mass of the triplet is

required to fall in the DS invariant mass peak 1.93 < mµµ+1tr < 2.01

GeV.

- at least one track with pT > 2 GeV, |η| < 2.4, charge 6= 0, dxy < 0.3

cm and dz < 20 cm;

- at least 1 triplet with mass compatible with the DS invariant mass

window, |charge| = 1 and χ2 of the vertex �t between 0 and 15.

If more than one triplet is found, the one with the smallest χ2 is chosen. The

invariant mass distribution before the selections on the mass windows are

shown in �gure 4.15.

(a) (b)

Figure 4.15: φ (a) and DS (b) invariant mass, before the selection in a mass

windows reported in the text.

The fake rate instead has been computed using a Minimum Bias MC dataset

and selecting events with at least one pion or kaon at generator level (by

requiring PdgId = 321 || PdgId = 211).

Figure 4.16 shows the shape of the discriminator in MC DS → φπ, evaluated

88

on the muons coming from the φ decay, and hadrons tagged as reconstructed

muon in MC MinimumBias.

Figure 4.16: Discriminator's shape in MC DS → φπ and MC

MinimumBias, for 2 muons.

I studied the e�ciency and fake rate of the standard identi�cation algorithms

(Soft, Loose and Tight ID) together with the e�ciency of the MLP-based low

momentum muon identi�cation. The results are plotted as function of pT ,

as shown in �gure 4.17.

Finally, in the table 4.9, the numerical values of these e�ciencies are re-

ported, for transverse momentum between 0 and 5 GeV.

It should be noted that MLP-based low momentum muon identi�cation

(MLP-Ultra-Soft muon ID in �gure) shows high e�ciency, comparable to

the one of the standard muon ID algorithms (Soft ID and Loose ID). In ad-

dition the fake rate obtained with the MLP-Ultra-Soft muon ID is reduced by

a factor of 12 for Loose ID and 25 for the Soft ID in the lowest pT bins (0.5-2

GeV). As expected the performance slightly degrade when the pT increases,

for pT > 4 GeV, since the new algorithm has been trained with a pT < 6

GeV. In particular the fake rate becomes comparable with Loose ID for pT> 4 GeV.

89

(a) (b)

(c) (d)

Figure 4.17: E�ciency vs pT for signal (MC DsPhiPi sample) and

background (MinimumBias sample).

In summary, the standard CMS Soft and Loose IDs, commonly used for

searches involving low-pT muons in the �nal state, show a very high e�-

ciency, close to 100%, for muons from LFV τ decay, but also a large fake

rate, about 40-50%, for hadrons coming from pile-up interactions.

The new algorithm MLP-Ultra-Soft muon ID is optimized to identify low-pTmuons (pT < 4 GeV), targeting a close to 100% e�ciency and O(10−2) fake

rate. Its performance have successfully been reached in the training phase,

and then cross-checked in independent control samples (DS → φπ, φπ → µµ

for signal and Minimum Bias events for background).

In particular, for pT < 4 GeV, the e�ciency of the new algorithm is compa-

90

Table 4.9: Summary of the e�ciency for all ID algorithms, used in this

study.

rable to the Loose and Soft IDs, while an average reduction of a factor 4 is

observed in the hadron → muon fake rate in the same region. In the lowest

pT bins (0.5-2 GeV), a fake rate reduction of a factor 12 is observed for Loose

ID and a factor of 25 for Loose ID.

91

Conclusion

An innovative algorithm for the low-momentum muon identi�cation in the

CMS experiment has been presented, using DS → τντ → 3µντ as a physics

study case.

The pT spectrum of the muons coming from this decay is particularly soft,

the most aboundant contribution coming from the range below 5 GeV. This

study identi�ed the light �avor mesons erroneously identi�ed as muons as a

relevant component of the background.

Despite the muon reconstruction and identi�cation standard algorithms shown

excellent performance for pT > 4 GeV, futher optimizations are needed in

the softer spectrum, in order to reach the same level of performance.

To this purpose, I have implemented a Machine Learning approach to dis-

tinguish signal muons from the background ones, exploiting the multivariate

analysis. This technique consists in combining a set of variables in order to

extract one single output, capable to classify signal and background, by ex-

ploting underlying patterns and correlations. The discriminators have been

trained, providing events simulated with Monte Carlo technique as an input

(muons from DS → τντ , τ → 3µ for the signal and custom pions and kaons

guns for the background). In this work, I compared a Boosted Decision Tree

and a Neural Network, choosing the latter for its better performances.

The performance of the new MLP-based low momentum muon identi�cation,

MLP-Ultra-Soft muon ID, have been measured in signal enriched and back-

ground enriched dedicated datasets, respectively DS → φπ and Minimum

Bias.

The new muon identi�cation algorithm shows for pT < 4 GeV, an e�ciency

comparable to the Loose and Soft IDs, while an average reduction of a factor

4 is observed in the hadron → muon fake rate. In the lowest pT bins (0.5-2

GeV), a fake rate reduction of a factor 12 is observed for Loose ID and a

factor of 25 for Loose ID.

92

The results obtained show that Maching Learning techniques are a promis-

ing alternative to conventional analysis. The algorithm proposed here will be

further optimized by introducing new variables, based on detector response

and quality reconstruction (e.g. cluster size of the hits, pull between the

measured hit and the one belonging to the �tted trajectory).

A further improvement during the training phase would come by exploiting

realistic physics processes (e.g. D+ → µ+νµK−π+), rather than the parti-

cle guns. Indeed a limitation in this study is the statistics available in the

Monte Carlo datasets used to train the discriminators, partially overcome by

privately generating them at the Bari Tier-2 computing center.

The results achieved in this study for the background suppression and pile-up

mitigation have been obtained in the Run 2 data taking conditions with on

average of 30 pile-up interactions per bunch crossing. These results allows

to develop a strategy to preserve the sensitivity of the CMS experiment in

the search for new physics involving the low momentum muons in the �nal

state. This will be extremely challenging in Run 3 data taking conditions

with 50 pile-up interactions and in particular in the high luminosity scenario,

characterized by up to 200 pile-up interactions per bunch crossing.

93

Appendix A

Introduction to Multivariate

Analysis and statistical learning

At LHC, and more generally in High Energy Physics (HEP), large quantities

of data are gathered and, since a lot of the studied process happens in a

tiny fraction of the collisions, more sophisticated techniques are required to

discriminate the event of interest (de�ned as signal) from the background:

the strategy is to use input information from multiple variables from various

sources thus performing the so-called Multivariate Analysis (MVA). In par-

ticular it becames useful in order to investigate very rare phenomena or to

reduce huge backgrounds.

The goal in this direction is the fusion between MVA tecniques and Machine

Learning (ML) algorithms which can automatize the analysis process.

A.1 De�nitions and basic concepts

The classical statistics is also known as univariate, looking at one variable at

a time; however in a statistical problem there's often more than one variable

involved, so univariate analysis can lead to the wrong conclusions. It is often

necessary to study or measure more than one variable simultaneously to un-

derstand a process or any set of samples with numerous measurements: the

Multivariate Analysis is a statistical analytical approach that simultaneously

evaluates multiple input variables or features, called predictors to provide

one output variable, called response [48]; essentially, it is a tool to predict

the e�ect a change in one variable will have on other variables.

Machine Learning, or statistical learning, is a group of multivariate analytic

94

methods that train a data sample in order to make predictions on unknown

datasets.

From a more strictly mathematical point of view, suppose that we observe

a quantitative response Y and p di�erent predictors, X1, X2, ..., Xp. We

assume that there is some relationship between Y and X = (X1, X2, ..., Xp),

which can be written in the very general form as

Y = f(X) + ε

Here f is some �xed but unknown function of X1, X2, ..., Xp and ε is a ran-

dom error term, which is indipendent of X and has mean zero; f represents

the systematic information that X provides about Y [48].

In essence, statistical learning refers to a set of approaches for estimating f .

The aim of the ML tecniques is the application of a statistical learning meth-

ods to training data in order to estimate the unkonown function f , minimiz-

ing the reducible error, due to the fact that it is possible to do only an

estimate of f .

A.1.1 Supervised and Unsupervised learning

The ML methods can be classi�ed into supervised and unsupervised. In the

�rst one, the training data that feed the algorithm include the desired solu-

tions, called labels: the learning happens by examples, i.e. the algorithm

extracts patterns from training data.

In contrast unsupervised learning describes the situation in which for every

observation we have a vector of measurements but no associated response.

It is re�ered to as unsupervised because we lack a response variable that can

supervise our analysis[48]. This last one type is not common in HEP (High

Energy Physics) and not covered in this chapter.

Typical supervised learning tasks are the classi�cation and regression.

A.1.2 Classi�cation vs Regression

The features can be characterized as either quantitative or qualitative. Quan-

titative variables take on numerical values; in contrast, qualitative ones take

on values in one of several classes or categories. We tend to refer to problems

with a quantitative response as regression problems, while those involving a

95

qualitative response are often referred to as classi�cation problems.

In the classi�cation task, the algorithm is trained with dataset that contain

events of signal and background (will restrict here to two class case, but many

classi�ers can in principle be extended to several classes) and it must learn

how to classify new datasets. For example in �gure A.1 is shown a simple

response of a classi�cation's algorithm in two dimensions. A possible solution

(without any misclassi�cation) is shown by the curved decision boundary and

any new event given by two coordinates would be classi�ed according to the

two sides of this boundary.

Figure A.1: A simple 2D example for a classi�cation problem. The circles

symbolise the �signal�, events with Y = 1, the squares stand for the

�background�, events with Y = 0 [47].

In the regression task, the value of some quantity of interest, called target

value is predicted, given a set of input features. A one-dimensional regression

problem is shown in �gure A.2. The seven crosses represent the data points

(�examples�) and the smooth curve may be a solution formed by a statistical

learning method. Any new event given by an x-coordinate will result in a

y-coordinate output according to the learned curve.

A.1.3 How do one choose a statistical learning method?

In statistics, no one method dominates all others over all possible data sets:

selecting the best approach can be one of the most challenging parts of per-

forming statistical learning in practice [48].

Firstly a method is selected on the basis of whether the response is qualitative

96

Figure A.2: A simple one-dimensional example for a regression problem [47].

or quantitative. Since the study of this thesis is based on the classi�cation's

algorithms, we will concentrate on the qualitative setting.

It possible for example to evaluate the performance of a classi�er, in which

histograms of the output distributions for signal and background form the

basis. Well designed classi�cation methods do not only give outputs 0 or 1,

but they give continuous values in the interval [0, 1] which could be inter-

preted as a probability. A value of 0.5 then means that this event could be

either class with almost the same probability.

The great advantage of a continuous output between 0 and 1 shows up when

a cut or threshold is de�ned to do the actual classi�cation. The signal e�-

ciency ε is then given by the percentage of recognised �good� events (output

> cut) and the background rejection r (1- bakground e�ciency) is given by

the percentage of recognised �bad� events (output < cut) [47].

How to choose the best threshold? The ROC curve is a popular tool for dis-

playing the results for all possible cuts in the e�ciency/rejection plane. The

name "ROC" is historic and cames from communications theory, as acronym

for receiver operating characteristics [?]. In �gure A.3 there is an example of

ROC curve: starting with no rejection and 100% e�ciency for cut = 0 and

ends at 100% rejection and no e�ciency for cut = 1.

The dotted line (ε + r = 1) represents the ROC curve of a purely random

classi�er or pre-scaling. A good classi�ers stays as far away from that line as

possible, toward to top right corner with full e�ciency and full rejection.

The overall performance of a classi�er, summarized over all possible thresh-

olds, is given by the area under the curve (AUC): a perfect classi�er will have

a AUC = 1, whereas a purely random classi�er will have AUC = 0.5 [49].

97

Figure A.3: An example of ROC curve: at the marked point an e�ciency of

97% is achieved with a cut at 0.23 and results in a background rejection of

80%. The separation power is 4.9 [47].

The larger AUC the better classi�er.

ROC curves are useful for comparing di�erent classi�ers, since they take into

account all possible thresholds: the several curves may be compared depend-

ing on the application and the separation power, calculated as the quotient

of signal and background e�ciency SP = ε1−r [47]. These criterions �xes a

speci�c cut and sets thus a �working point� for the classi�er.

A.1.4 Problems

Since the main task is to select a learning algorithm and train it on some

data, the two things that can go wrong are "bad algorithm" and "bad data".

A �rst problem concerns insu�cient quantity of training data: the algorithm

needs thousand of examples to learn and it is crucial that the training data

be representative of the new cases we want to generalize to.

If the sample is too small, we will have sampling noise (i.e., nonrepresentative

data as a result of chance), but even very large samples can be nonrepresen-

tative if the sampling method is �awed. This is called sampling bias [49].

A critical part of the success of a ML project arise also from a good set of

features to train on. This process, called feature engineering [49], involves:

o Feature selection: selecting the most useful features to train on among

existing features.

o Feature extraction: combining existing features to produce a more use-

98

ful one.

o Creating new features by gathering new data.

The only way to know how well a model will generalize to new cases is to

actually try it out on new cases. A better option is to split the data into two

sets: the training set to train model and the test set to test it. The error rate

on new cases is called the generalization error, and by evaluating the model

on the test set, we get an estimation of this error. This value tells how well

model will perform on new instances.

Monte Carlo simulations are a standard way of generating training data

examples. But they must be used with care: both underlying physics and

the detector response have to be understood very well to create a simulation

which generates events matching the experimental observations. Even very

small deviations, correlations or deviations that exist in the simulation but

not in reality may result in a trained method that handles simulated events

perfectly, but shows a behaviour like random guessing on real data [47].

Over�tting and Under�tting

If the training error is low (i.e. the model makes few mistakes on the training

set) but the generalization error is high (i.e. the model does not generalize

well), it means that the model is over�tting the training data [49].

At the opposite side, there is the under�tting phenomenon: it occurs when

the model is too simple to learn the underlying structure of the data. The

main options to �x this problem are: selecting a more powerful model, with

more parameters; feeding better features to the learning algorithm.

Summarizing: the system will not perform well if the training set is too

small, or if the data is not representative, noisy, or polluted with irrelevant

features. The model needs to be neither too simple (it will under�t) nor too

complex (it will over�t) [49].

A.1.5 Data Preprocessing

Preprocessing data means transforming inputs x which are directly measured

by the detector into new inputs x' which are better suited to describe the

event in one or more of the following senses:

99

- the transformed inputs may make use of prior knowledge;

- the transformation may re�ect a symmetry that is inherent to all events;

- if the input space is very high-dimensional and it is unknown how to reduce

the dimensionality1, a transformation based on automatic procedures can be

very helpful [47].

Preprocessing can be useful to reduce correlations among the variables, to

transform their shapes into more appropriate forms, or to accelerate the re-

sponse time of a method.

For supervised methods there are 5 main variable transformation methods:

normalisation, decorrelation, PCA and gaussianisation. For more details see

Ref. [46].

A.2 Classi�cation methods

There are many possible classi�cation techniques or classi�ers, that one

might use to predict a qualitative response, but I will touch only the methods

that I used in my analysis, starting from the simplest (Linear Discriminant),

used only for a comparison in the performances, to the two more computer-

intensive methods (Boosted Decision Tree and Multilayer Perceptron).

A.2.1 Linear Discriminant Analysis

The linear discriminant analysis (LD) algorithm uses a linear model, where

linear refers to the discriminant function y(x), linear in the parameters β:

y(x) = x>β + β0

where β0 (bias) is adjusted so that y(x) ≥ 0 for S and y(x) < 0 for B [46].

1Many ML problems involve thousands or even millions of features for each training

istance. Not only does this make training extremely slow, it can also make it much harder

to �nd a good solution: it talks about curse of dimensionality. Many things behave very

di�erently in high-dimensional space: the more dimensions the training set has, the greater

the risk of over�tting it [49].

100

Description and implementation

Assuming that there are m+1 parameters β0, ..., βm to be estimated using a

training set comprised of n events, the equation for β is: Y = Xβ, where

Y =

y1y2...

yn

and X =

1 x11 · · · x1m1 x21 · · · x2m...

.... . .

...

1 xn1 · · · xnm

and the constant column in X represents the bias β0, absorbed into the vector

β, and Y is composed of the target values with yi = 1 if the ith event belongs

to the S class and yi = 0 if the ith event belongs to the B.

Applying the method of least squares, we now obtain the normal equations

for the classi�cation problem, given by

XTXβ = XTY ⇐⇒ β =(XTX

)−1XTY

If weighted events are used, this is simply taken into account by introducing

a diagonal weight matrix W and modifying the normal equations as follows:

β =(XTWX

)−1XTWY

Considering two events x1 and y2 on the decision boundary, we have y(x1)

= y(x2) = 0 and hence (x1 - x2 )Tβ = 0. Thus we see that the LD can be

geometrically interpreted as determining the decision boundary by �nding

an orthogonal vector β [46].

Variable ranking

This implementation of LD provides a de�nition of input variable importance,

or ranking, based on the coe�cients of the variables in the linear combination

that forms the decision boundary. The order of importance of the variables

is assumed to agree with the order of the absolute values of the coe�cients.

Performance

The LD is optimal for Gaussian distributed variables with linear correla-

tions: no discrimination is achieved when a variable has the same sample

mean for signal and background, but the LD can often bene�t from suitable

transformations of the input variables.

101

A.2.2 Boosted Decision Tree

A decision tree is a binary tree classi�er similar to the one in �gure A.4.

Figure A.4: Schematic view of a decision tree [46].

Starting from the root node, a sequence of recursive binary splits using the

discriminating variables xi is applied to the data. Each split uses the variable

that at this node gives the best separation between signal and background

when being cut on. The same variable may thus be used at several nodes,

while others might not be used at all. The node splitting stops once it has

reached the minimum number of events, the maximum number of nodes and

the maximum depth, all speci�ed in the BDT con�guration. The leaf nodes

at the bottom end of the tree are labeled �S� and �B� depending on the

majority of events that end up in the respective nodes [46].


The decision tree is able to split the phase space into a large number of

hypercubes or regions, each of which is identi�ed as either �signal-like� or

�background-like�. The buidling is not linear.

Each observation belongs to the most commonly occurring class/region of

training observations in the region to which it belongs. To task of growing a

such structure a criterion, called classi�cation error rate, is used: it is de�ned

as the fraction of the training observations in that region that do not belong

102

to the most common class[48].

However it turns out that classi�cation error is not su�ciently sensitive for

tree-growing: two other measures are preferable. One is Gini index, a mea-

sure of total variance across the K classes or better is re�ered to a mesure

of node purity : a small value indicates that a node contains predominantly

observations from a single class; so gini = 0 means that a node is pure. The

other measure is the cross-entropy that takes on a small value if the mth

node is pure. They are very similar: Gini is slightly faster to compute and

tends to isolate the most frequent class in its own branch of the tree; entropy

tends to produce slightly more balanced trees [49].

A shortcoming of the decision trees is their instability with respect to sta-

tistical �uctuations in the training sample from which the tree structure is

derived. For example, if two input variables exhibit similar separation power,

a �uctuation in the training sample may cause the tree growing algorithm

to decide to split on one variable, while the other variable could have been

selected without that �uctuation. In such a case the whole tree structure is

altered below this node, possibly resulting also in a substantially di�erent

response.

This problem is overcome by constructing a multiple decision trees (forest)

and classifying an event on a majority vote of the classi�cations done by

each tree in the forest, using bagging, randomising and boosting techniques.

The trees are derived from the same training ensemble by reweighting events,

and are �nally combined into a single classi�er which is given by a (weighted)

average of the individual decision trees [46].

Bagging

The decision tree so far presented has the inconvenient of high variance, that

means, if we split the training data into two parts at random and �t a de-

cision tree to both halves, the results could be quite di�erent. In contrast a

procedure with low variance will yield similar results if applied repeatedly to

distinct data sets. Bagging is a general purpose procedure for reducing the

variance of a statistical learning method: given a set of n indipendent obser-

vation each with variance σ2, the variance of the mean is σ2/n; so averaging

a set of observations reduce variance [48].

It is a resampling technique where a classi�er is repeatedly trained using re-

sampled di�erent training events such that the combined classi�er represents

an average of the individual classi�ers. Resampling includes the possibility

103

of replacement, which means that the same event is allowed to be (randomly)

picked several times from the parent sample. The algorithm will predict �nal

answer via simple majority voting: the overall prediction is the most com-

monly occurring class among the predictions.

Training several classi�ers with di�erent resampled training data and com-

bining them into a collection, results in an averaged classi�er that is more

stable with respect to statistical �uctuations in the training sample [46].

Random forest

Di�erently to bagging, with random forest method each tree is grown using

only one (resampled) subset of the original training events and at each split

a random subset is considered as split candidates from the set of predictors.

The main di�erence between bagging and random forests is the choice of

predictor subset size. Using a small value of size in building a random forest

will typically be helpful when we have a large number of correlated predictors

[48]. Suppose that there is one very strong predictor in the data set, along

with a number of other moderately strong predictors, then in the collection

of bagged trees, most or all of the trees will use this strong predictor in the

top split. Consequently all of the bagged trees will look quite similar to each

other. Hence the predictions from the bagged trees will be highly correlated

and unfortunately, averaging many highly correlated quantities does not lead

to as large of a reduction in variance as averaging many uncorrelated quan-

tities. This means that bagging will not lead to a substantial reduction in

variance over a single tree. Random forests overcome this problem by forcing

each split to consider only a subset of the predictors: so an average of the

splits will not even consider the strong predictor and other predictors will

have more of a chance. We can think of this process as decorrelating the trees.

Boosting

Boosting works in a similar way of bagging, except that the trees are grown

sequentially: each tree is built using information from previously grown trees.

In this case each tree is �t on a modi�ed version of the original data set [48].

It says that boosting approach learns slowly and in general statistical learn-

ing approaches that learn slowly tend to perform well.

Boosting has 3 tuning parameters: the number of trees (unlike bagging and

random forest, boosting can over�t if this number is too large); the shrinkage

parameter (controls the rate at which boosting learns, typical 0.01 or 0.001

depending on problem); the number of splits in each tree, which controls the

104

complexity of the boosted ensemble.

There are two popular boosting methods: adaptive and gradient boost.

Adaptive Boost (AdaBoost) is the most popular: events that were misclassi-

�ed during the training of a decision tree are given a higher event weight in

the training of the following tree. Starting with the original event weights

when training the �rst decision tree, the subsequent tree is trained using a

modi�ed event sample where the weights of previously misclassi�ed events

are multiplied by a common boost weight α, and so on. The algorithm stops

when the desired number of predictors is reached [46].

The boost weight is derived from the misclassi�cation rate, err, of the pre-

vious tree

α =1− err

err

The weights of the entire event sample are then renormalised such that the

sum of weights remains constant. We de�ne the result of an individual clas-

si�er as h(x), encoded for signal and background as h(x) = +1 and - 1,

respectively. The boosted event classi�cation yBoost(x) is then given by

yBoost(x) =1

N collection·N collection∑

i

ln (αi) · hi(x)

where the sum is over all classi�ers in the collection. Small (large) values for

yBoost(x) indicate a background-like (signal-like) event.

Gradient Boost : is a simple additive expansion approach. It works by subse-

quentially adding predictors to an ensemble, each one correcting its predde-

cessor. However instead of modifying the istance weights at every iteraction

like AdaBoost, it tries to �t the new predictor to the residual error made by

the previous predictor [49].

Its robustness can be enhanced by reducing the learning rate.

Variable ranking

A ranking of the BDT input variables is derived by counting how often the

predictors are used to split decision tree nodes, and by weighting each split

occurrence by the separation gain-squared it has achieved and by the number

of events in the node (mesuring the Gini index). This measure of the variable

importance can be used for a single decision tree as well as for a forest [46].

105

Performance

Depending on the problem, if the relationship between the features and the

response is well approximated by a linear model, then an approach such

the traditional classi�cation algorithm, as LD, will work well. If instead

there is a highly non-linear and complex relationship then decision trees may

outperform classical approaches [48].

Decision trees are also insensitive to the inclusion of poorly discriminating

input variables. While for arti�cial neural networks, as we will see, it is

typically more di�cult to deal with such additional variables, the decision

tree training algorithm will basically ignore non-discriminating variables as

for each node splitting only the best discriminating variable is used. However,

the simplicity of decision trees has the drawback that their theoretically best

performance is generally lower with respect to other techniques like neural

networks; morover it is more prone to overtraining.

A.2.3 Multilayer Perceptron

The Arti�cial Neural Network (ANN) in the data analysis were �st intro-

duced by the neurophysiologist W. McCulloch and the matematician W.

Pitts [49]. It is a computational model vaguely inspired by the biological

neural connections that constitute a human brain, speci�cately designed to

non-linear learning problems and capable to compute any logical proposition

we want.

By applying an external signal to some binary on/o� inputs the network is

put into a de�ned state that can be measured from the response of one or

more binary outputs. One can therefore view the neural network as a map-

ping from a space of input variables x1, ..., xn onto a one-dimensional (e.g. in

case of a signal vs background discrimination problem) or multi-dimensional

space of output variables y1, ..., ym [46].

There are several type of ANN algotirhm, but I will discuss here only the

one used in this work: Multilayer Perceptron (MLP).


The basic unit of computation in a neural network is the neuron, often called

a node or unit: the Perceptron, invented in 1957 by Rosenblatt [49], is a

particular arti�cial neuron, called Linear Threshold Unit (LTU) (see �gure

A.5): the input and output are numbers and no more binary values, and

106

each input connection is associated with weight, which tells the neuron to

respond more to one input and less to another.

Figure A.5: An exemple of Linear threshold unit [49].

The LTU computes a weighted sum of its inputs (z = w1x1 + w2x2 + · · · +wnxn = wTx), then applies a step function (commonly the heaviside function

or the sign function) to that sum and gives as output a non-linear R 7→ Rfunction, called the Activation Function:

hw(x) = step(z) = step(wTx

)The purpose of the activation function is to introduce non-linearity into the

output of a neuron. This is important because most real world data is non

linear and we want neurons to learn these non linear representations. There

are several activation functions we may encounter in practice: linear, sig-

moid, tanh or radial, but tanh and sigmoid are the most used.

A single LTU can be used for simple linear binary classi�cation: it computes

a linear combination of the inputs and if the result exceeds a threshold, it

outputs the positive class or else the negative one.

Training a LTU means �nding the right values for wi.

A Perceptron is simply composed by a single layer of LTUs with each neuron

connected to all the inputs: this connections are often represented using spe-

cial passthrough neurons called input neurons, which just output whatever

input they are fed. Morover an extra bias feature is generally added (x0 =

1), which is typically represented using a special type of neuron called bias

neuron, which just puts 1 all the time. The main function of Bias is to pro-

vide every node with a trainable constant value (in addition to the normal

inputs that the node receives). In �gure A.6 is shown e.g. a Perceptron with

107

2 inputs and 3 outputs: it can classify istances simultaneously into three

di�erent binary classes, which makes it a multi-ouput classi�er.

Figure A.6: An exemple of perceptron diagram [49].

Generally the Perceptron is trained in this way: it is fed one training instance

at a time and for each instance it makes its predictions. For every output

neuron that produced a wrong prediction, it reinforces the connection weights

from the inputs that would have contributed to the connect prediction.

To improve the Perceptron performance, a multi-layer structure has been in-

troduced: it is composed of one passthrough input layer, one or more layers

of LTUs, called hidden layers and one �nal layer of LTUs called the output

layer. Except the latter, every layer includes a bias neuron and is fully con-

nected to the next layer (see �gure A.7): this simple structure is a MLP.

When an ANN has two or more hidden layers it is called Deep Neural Net-

work (DNN).

The behaviour of the network is determined by the layout of the neurons, the

weights of the inter-neuron connections and by the response of the neurons to

the input, described by the neuron response function ρ that maps the neuron

input i1, ..., in onto the neuron output.

In a simple MLP we can change the number of layers, the number of neurons

per layer, the type of activation function to use in each layer and the weight

inizialization logic: what is the best combination of these parameters?

A practical rule is: for a MLP a single hidden layer is su�cient to approxi-

mate a given continuous correlation function to any precision, provided that

a su�ciently large number of neurons is used in the hidden layer. If the

available computing power and the size of the training data sample su�ce,

one can increase the number of neurons in the hidden layer until the opti-

108

Figure A.7: Multilayer perceptron with one hidden layer [46]. The yji are

the label of the target class of a given sample.

mal performance is reached. It is likely that the same performance can be

achieved with a network of more than one hidden layer and a potentially

much smaller total number of hidden neurons. This would lead to a shorter

training time and a more robust network [46].

In this analysis 1HL with large number of neurons is chosen, because in-

creasing the number of HL, the shape fo the disciminator improved but the

algorithm's performance get worse.

A Multi-layer Perceptron learns by means of an iterative algorithm, which

computes the output of every neuron in the net and measures the network's

output error, i.e. the di�erence between the desired output and the actual

output of the network. Its purpose is to minimizing this error, acting on the

choice of the weights. This algorithm is commonly called Back-propagation

(BP); to reduce the number of iterations it's possible to use a variant of the

method, labeled BFGS.

Back-propagation (BP)

It is the most common algorithm for adjusting the weights that optimise the

classi�cation performance of the neural network. In details, the output of

109

a network (here for simplicity assumed to have a single hidden layer with

a Tanh activation function, and a linear activation function in the output

layer) is given by:

yANN =

nh∑j=1

y(2)j w

(2)j1 =

nh∑j=1

tanh

(nvar∑i=1

xiw(1)ij

)· w(2)

j1

where nvar and nh are the number of neurons in the input layer and in the

hidden layer, respectively, w(1)ij is the weight between input-layer neuron i

and hidden-layer neuron j, and w(2)j1 is the weight between the hidden-layer

neuron j and the output neuron.

During the learning process the network is supplied with N training events

xa = (x1, . . . , xnvar)a, a = 1, ..., N. For each training event a, the neural

network output yANN,a is compared to the desired output ya ∈ {1, 0}.An error function E, measuring the agreement of the network response with

the desired one, is de�ned by

E (x1, . . . ,xN |w) =N∑a=1

Ea (xa|w) =N∑a=1

1

2(yANN,a − ya)2

where w denotes the ensemble of adjustable weights in the network. The set

of weights that minimises the error function can be found using the method of

steepest or gradient descent(*), provided that the neuron response function

is di�erentiable with respect to the input weights. Starting from a random

set of weights w(ρ) the weights are updated by moving a small distance in

w-space into the opposite direction of the gradient, −∇wE:

w(ρ+1) = w(ρ) − η∇wE

where the positive number η is the learning rate, that is choosen carefully to

equilibrate the velocity of learning and the risk of failure in the research of

global minimum of the cost function.

The weights connected with the output layer are updated by

∆w(2)j1 = −η

N∑a=1

∂Ea

∂w(2)j1

= −ηN∑a=1

(yANN,a − ya) y(2)j,a

and the weights connected with the hidden layers are updated by

∆w(1)ij = −η

N∑a=1

∂Ea

∂w(1)ij

= −ηN∑a=1

(yANN,a − ya) y(2)j,a

(1− y(2)j,a

)w

(2)j1 xi,a

110

where we have used tanh′ x = tanhx(1− tanhx).

(*) Gradient descent is an optimization algorithm capable of tweaking parameters iteratively in

order to minimize the cost function (the distance between the model's predictions and the training exam-

ples): it measures the gradient of the error function with regards to a parameter vector θ, which represents

the weight, typically random: it goes in the direction of descending gradient. Concretely you start by

�lling θ with random values and then you improve it gradually, taking one small step at time, attempting

to decrease the cost function, until the algorithm converges to a minimum (see �gure A.8).

Figure A.8: Gradient descent [48].

An important parameter is the size of step, depending on learning rate. If η is too small, then the algo-

rithm will have to go through many iterations to converge, which will take a long time; if η is too high you

might jump across the valley and end up on the other side, possibly even higher up than you were before:

this might make the algorithm diverge, with larger and larger values failing to �nd a good solution.

When using Gradient Descent tou should ensure that all features have similar scale or else it will take

much longer to converge.

A.2.4 BFGS

The Broyden-Fletcher-Goldfarb-Shannon (BFGS)method di�ers from

BP by the use of second derivatives of the error function to adapt the synapse

weight by an algorithm which is composed of four main steps:

1. Two vectors are calculated: the vector of weight changes D that repre-

sents the evolution between one iteration of the algorithm (k-1) to the

next (k) and the vector Y, the vector of gradient errors.

111

2. Approximate the inverse of the Hessian matrix, H−1 , at iteration k by

H−1(k) =D ·DT ·

(1 + Y T ·H−1(k−1) · Y

)Y T ·D

−D·Y T ·H+H·Y ·DT+H−1(k−1)

3. Estimate the vector of weight changes by

D(k) = −H−1(k) · Y (k)

4. Compute a new vector of weights by applying a line search algorithm,

in which the error function is locally approximated by a parabola. The

algorithm evaluates the second derivatives and determines the point

where the minimum of the parabola is expected. The algorithm then

evaluates points along the line de�ned by the direction of the gradient

in weights space to �nd the absolute minimum. The weights at the

minimum are used for the next iteration.

The advantage of the BFGS method compared to BP is the smaller number

of iterations. However, because the computing time for one iteration is pro-

portional to the squared number of synapses, large networks are particularly

penalised [47].

Variable ranking

The MLP neural network implements a variable ranking that uses the sum

of the weights-squared of the connections between the variable's neuron in

the input layer and the �rst hidden layer. The importance Ii is given by

Ii = x2i

nh∑j=1

(w

(1)ij

)2, i = 1, . . . , nvar

Performance

Its main characteristics with respect to BDT are: slow training; expect higher

bias; expect high performance.

A.3 TMVA: Toolkit for Multivariate Analysis

It is an integrated environment of ML, with the possibility to access to a

wide number of multivariate classi�cation or regression algorithms. All mul-

tivariate techniques in TMVA belong to the family of �supervised learning�

112

algorithms. Here the samples in a dataset are called events; di�erent type

of variables are included in a speci�c �le supported by ROOT, called Tree

with .root extension; all informations about the dataset are included in the

DataLoader and they are passed to a Factory object that organises the

interaction between the user and the TMVA analysis steps. It performs pre-

analysis and preprocessing of the training data to assess basic properties of

the discriminating variables, then books, trains and tests the classi�cation

methods selected. Each MVA trained method writes its con�guration and

response in a result (�weight�) �le, which in the default con�guration has

readable XML format.

From this phase a .root �le is produced, which is read directly by means a

TMVA function (TMVA::TMVAGui), which allow to display a lot of informa-

tions: input variables distributions, the variables correlations, the e�ciency

of the classi�er's cut, the ROC curves and some important visual result about

the booked classi�ers (for example the BDT diagram or neural network ar-

chitecture).

Then to validate the developped algorithm Reader object is used: it reads

and interprets the weight �les, which can be included in any C++ executable,

ROOT macro, or python analysis job[46].

113

Appendix B

Performance of muons

reconstruction algorithm

In this Appendix are reported all results of the standard reconstruction al-

gorithms, Global and Tracker, as mentioned in Section 4.6.1.

B.1 Signal and Background de�nition in train-

ing phase

Figures B.1, B.2 and B.3 show the results about the e�ciency of signal muons,

fake muons and εK,π→µµ, respectively.

The study of the background performance for reconstruction algorithm is

reported in the �gures B.4 and B.5, for fake rate and muon from K or π

decay respectively.

Starting from the Global algorithm, it is possible to observe that in the

endcap region (η > 1.4) the muons from K are the most e�cient with 86-

88%, the muons not from K have 70-83% and the minimum contribution

is provided from non-muons with 0.8-1% of e�ciency; indead in the barrel

region (η < 1.4), where the e�ciency are all very high between 80-99%: the

value equal to 100% are a consequence of the same value of numerator and

denominator! Observing the behaviour in function of pT , it's clear that the

muons from K rise up to the plateau already to 3.5 GeV and the minimum

contribution is provided by non-muons. However in general for value greater

than 5 GeV there are higher e�ciency value!

The Tracker algoritms has a reverse pro�le, with a decrease in the barrel

114

(a) (b)

Figure B.1: Global and Tracker e�ciency vs η (a) and pT (b), for muons

from τ → 3µ decay.

(a) (b)

Figure B.2: Global and Tracker fake rates vs η (a) and pT (b).

region: it means that in the barrel, when there is a drop, it is more di�cult

to reconstruct muon from K, for example, than the Global method. The

reason could be due to the fact that Global in the barrel has also a greater

number of events than Tracker, so probably all of these muons born from

the interaction of the Kaon with the absorber between the HCAL and DT.

They therefore stay only in the muon system and not in the tracker. To

con�rm these hypotesis, I observed the coordinates of the primary vertex in

115

(a) (b)

Figure B.3: Global and Tracker vs η (a) and pT (b), for muons from K or π

decay.

(a) (b)

Figure B.4: Global and Tracker muon fake rate as function of η (a) and pT(b), for guns.

the transverse plane, to understand if these muons are created closer to the

bunch crossing or if they are produced forward and so come from the muon

system or not. For the muons from K, I found a dxy distribution with a tail

to 2 cm. Morover the study of the variable simFlavour, which stores the

�avour of the muon, con�rmed that these muons weren't primary muons.

116

(a) (b)

Figure B.5: Global and Tracker muon of muon coming from K or π decay

as function of η (a) and pT (b), for guns.

B.2 Validation phase results

The e�ciency of the standard reconstruction algorithms, obtained in the

application phase are reported in �gures B.6 and B.7.

(a) (b)

Figure B.6: E�ciency of the Global (a) and Tracker (b) algorithms vs η for

signal (DsPhiPi sample) and background (MinimumBias sample).

The e�ciency signal and the fake rates are summarized in the table B.1.

117

(a) (b)

Figure B.7: E�ciency of the Global (a) and Tracker (b) algorithms vs pTfor signal (DsPhiPi sample) and background (MinimumBias sample).

Table B.1: Summary of the e�ciency for all RECO algorithms, used in this

study.

118

Bibliography

[1] De Angelis Alessandro and Pimenta Mário João Martins. Introduction

to particle and astroparticle physics: multimessenger astronomy and its

particle physics foundations; 2nd ed. Undergraduate lecture notes in

physics. Springer, Cham, 2018.

[2] O. Nachtmann. Elementary Particle Physics: Concepts and Phenomena.

1990.

[3] James William Rohlf. Modern Physics from A to Z. John Wiley and

Sons, New York, 1994.

[4] M. Tanabashi and K. at al. Hagiwara. Review of particle physics. Review

of particle physics. Phys. Rev. D, 98:030001, Aug 2018.

[5] TWiki. Summaries of CMS cross section measurements, 2019.

[6] Measurements of properties of the Higgs boson decaying into four leptons

in pp collisions at sqrts = 13 TeV. Technical Report CMS-PAS-HIG-16-

041, CERN, Geneva, 2017.

[7] Georges Aad et al. Evidence for the Higgs-boson Yukawa coupling to

tau leptons with the ATLAS detector. JHEP, 04:117, 2015.

[8] A. M. Sirunyan et al. Observation of Higgs boson decay to bottom

quarks. Phys. Rev. Lett., 121(12):121801, 2018.

[9] Albert M Sirunyan et al. Combined measurements of Higgs boson cou-

plings in proton�proton collisions at√s = 13TeV. Eur. Phys. J.,

C79(5):421, 2019.

[10] Albert M Sirunyan et al. Observation of electroweak production of

same-sign W boson pairs in the two jet and two same-sign lepton �-

119

http://inspirehep.net/record/380717/?ln=it

https://link.aps.org/doi/10.1103/PhysRevD.98.030001

https://twiki.cern.ch/twiki/bin/view/CMSPublic/PhysicsResultsCombined

https://cds.cern.ch/record/2256357?ln=it


https://arxiv.org/abs/1501.04943






https://inspirehep.net/record/1624170


nal state in proton-proton collisions at√s = 13 TeV. Phys. Rev. Lett.,

120(8):081801, 2018.

[11] Albert M Sirunyan et al. Evidence for the associated production of a

single top quark and a photon in proton-proton collisions at√s = 13

TeV. Phys. Rev. Lett., 121(22):221802, 2018.

[12] D. Contardo, M. Klute, J. Mans, L. Silvestris, and J. Butler. Technical

Proposal for the Phase-II Upgrade of the CMS Detector. 2015.

[13] John Butler. Searches for Dark Matter at the LHC. Dark matter searches

at the LHC. Technical Report ATL-PHYS-PROC-2018-055, CERN,

Geneva, Jun 2018.

[14] Roel Aaij et al. Search for Dark Photons Produced in 13 TeV pp Colli-

sions. Phys. Rev. Lett., 120(6):061801, 2018.

[15] A. Alavi-Harati et al. Observation of direct CP violation in KS,L → ππ

decays. Phys. Rev. Lett., 83:22�27, 1999.

[16] Kazuo Abe et al. Observation of large CP violation in the neutral B

meson system. Phys. Rev. Lett., 87:091802, 2001.

[17] Roel Aaij et al. Measurement of CP asymmetry in D0 → K−K+ and

D0 → π−π+ decays. JHEP, 07:041, 2014.

[18] Christopher W. Walter. The Super-Kamiokande Experiment. pages

19�43, 2008.

[19] A. Abashian et al. The Belle Detector. Nucl. Instrum. Meth., A479:117�

232, 2002.

[20] Bernard Aubert et al. The BaBar detector. Nucl. Instrum. Meth.,

A479:1�116, 2002.

[21] K. Hayasaka et al. Search for Lepton Flavor Violating Tau Decays into

Three Leptons with 719 Million Produced Tau+Tau- Pairs. Phys. Lett.,

B687:139�143, 2010.

[22] CMS Collaboration. Search for τ → 3µ decays using τ leptons produced

in D and B meson decays. 2019.

120


http://inspirehep.net/record/1686000?ln=it





https://cds.cern.ch/record/2626939




http://inspirehep.net/record/500819

http://inspirehep.net/record/500819










http://cds.cern.ch/record/2668282


[23] CMS Collaboration. A Search for Beyond Standard Model Light Bosons

Decaying into Muon Pairs. 2016.

[24] Lyndon Evans and Philip Bryant. LHC Machine. JINST, 3:S08001,

2008.

[25] G. L. Bayatian et al. CMS Physics - Technical Design Report Volume

I:Detector Performance and Software. 2006.

[26] CMS Public Web. Public CMS Luminosity Information.

[27] LHC Commisioning. Peak Luminosity.

[28] The CMS collaboration. Performance of CMS muon reconstruction in

pp collision events at sqrt(s) = 7 TeV. Journal of Instrumentation,

7(10):P10002�P10002, oct 2012.

[29] S. Chatrchyan et al. The CMS Experiment at the CERN LHC. JINST,

3:S08004, 2008.

[30] Bora Akgun. Performance of the CMS Phase 1 Pixel Detector. Technical

Report CMS-CR-2018-012, CERN, Geneva, Jan 2018.

[31] A Dominguez and editor = Abbaneo at al. CMS Technical Design Report

for the Pixel Detector Upgrade. Technical report CERN-LHCC-2012-

016. CMS-TDR-11, CERN, Geneva, Sep 2012.

[32] CMS Collaboration. Technical proposal for the upgrade of the CMS de-

tector through 2020. Technical Report CERN-LHCC-2011-006. LHCC-

P-004, Jun 2011.

[33] The performance of the CMS muon detector in proton-proton colli-

sions at sqrt(s)= 7 TeV at the LHC. Journal of Instrumentation,

8(11):P11002�P11002, nov 2013.

[34] The CMS Collaboration. Performance of the CMS muon detector and

muon reconstruction with proton-proton collisions at sqrt(s)=13 TeV,

Journal of Instrumentation, 13(06):P06015�P06015, jun 2018.

[35] The CMS Collaboration. The CMS trigger system, doi = 10.1088/1748-

0221/12/01/p01020. Journal of Instrumentation, 12(01):P01020�

P01020, jan 2017.

121






https://twiki.cern.ch/twiki/bin/view/CMSPublic/LumiPublicResults

https://lhc-commissioning.web.cern.ch/lhc-commissioning/schedule/images/optimistic-nominal-19.png

https://doi.org/10.1088%2F1748-0221%2F7%2F10%2Fp10002













[36] CMS Collaboration. Pileup mitigation at CMS in 13 TeV data. 2019.

[37] S Chatrchyan et al. Calibration of the CMS Drift Tube Chambers and

Measurement of the Drift Velocity with Cosmic Rays. JINST, 5:T03016,

2010.

[38] CMS Public Web. Performance of the CMS Muon Detectors in early

2018 collision runs.

[39] R. Fruhwirth. Application of Kalman �ltering to track and vertex �tting.

Nucl. Instrum. Meth., A262:444�450, 1987.

[40] Muon Identi�cation and Isolation e�ciency on full 2016 dataset. Mar

2017.

[41] Muon identi�cation and isolation e�ciencies with 2017 and 2018 data.

Jul 2018.

[42] R. Brun and F. Rademakers. ROOT: An object oriented data analysis

framework. Nucl. Instrum. Meth., A389:81�86, 1997.

[43] Torbjörn Sjöstrand, et al. An Introduction to PYTHIA 8.2. Comput.

Phys. Commun., 191:159�177, 2015.

[44] S. Agostinelli et al. GEANT4: A Simulation toolkit. Nucl. Instrum.

Meth., A506:250�303, 2003.

[45] Anders Ryd, et al. EvtGen: A Monte Carlo Generator for B-Physics.

2005.

[46] Andreas Hocker et al. TMVA - Toolkit for Multivariate Data Analysis.

2007.

[47] Jens Zimmermann. Statistical Learning in High Energy and Astro-

physics. October 2005.

[48] Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshi-

rani. An Introduction to Statistical Learning: With Applications in R.

Springer Publishing Company, Incorporated, 2014.

[49] Aurlien Gron. Hands-On Machine Learning with Scikit-Learn and Ten-

sorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems.

O'Reilly Media, Inc., 1st edition, 2017.

122

http://cms-results.web.cern.ch/cms-results/public-results/preliminary-results/JME-18-001/index.html



https://twiki.cern.ch/twiki/bin/view/CMSPublic/MuonDPGPublic180622

https://twiki.cern.ch/twiki/bin/view/CMSPublic/MuonDPGPublic180622



http://cds.cern.ch/record/2629364?ln=it




https://arxiv.org/abs/physics/0703039

http://nbn-resolving.de/urn:nbn:de:bvb:19-43537

http://nbn-resolving.de/urn:nbn:de:bvb:19-43537

Acknowledgments

123

STUDY OF LOW MOMENTUM MUONS RECONSTRUCTION WITH … Nuclear, Subnuclear and Astroparticle physics...

Documents

Transcript of STUDY OF LOW MOMENTUM MUONS RECONSTRUCTION WITH … Nuclear, Subnuclear and Astroparticle physics...