STUDY OF LOW MOMENTUM MUONS RECONSTRUCTION WITH … Nuclear, Subnuclear and Astroparticle physics...
Transcript of STUDY OF LOW MOMENTUM MUONS RECONSTRUCTION WITH … Nuclear, Subnuclear and Astroparticle physics...
Universita degli Studi di Bari
Dipartimento Interateneo di Fisica ”Michelangelo Merlin”Nuclear, Subnuclear and Astroparticle physics curriculm
MASTER DEGREE THESIS
STUDY OF LOW MOMENTUM MUONS RECONSTRUCTIONWITH CMS EXPERIMENT AT LHC
Thesis supervisorsDott.ssa Anna ColaleoDott.ssa Rosamaria Venditti
CandidateLeonarda Lorusso
Academic year 2018/2019
.....
.....
Contents
Introduction 1
1 The theoretical scenario at LHC 4
1.1 The Standard Model of the Particle Physics . . . . . . . . . . 4
1.2 QCD: Quantum Chromodinamics . . . . . . . . . . . . . . . . 8
1.3 The electroweak interaction . . . . . . . . . . . . . . . . . . . 11
1.4 The Higgs mechanism . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Standard Model re-discovery at CMS . . . . . . . . . . . . . . 14
1.6 Open questions and Physics beyond SM (BSM) . . . . . . . . 17
1.6.1 Low momentum muons as a tool to probe New Physics 21
2 The CMS experiment at LHC 24
2.1 The Large Hadron Collider . . . . . . . . . . . . . . . . . . . . 24
2.1.1 Technical characteristics . . . . . . . . . . . . . . . . . 24
2.1.2 LHC between past and future . . . . . . . . . . . . . . 26
2.2 The CMS Experiment . . . . . . . . . . . . . . . . . . . . . . 28
2.2.1 The detector . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.2 The CMS muon system . . . . . . . . . . . . . . . . . . 34
2.2.3 The trigger system . . . . . . . . . . . . . . . . . . . . 41
2.3 The Run 2 data taking conditions . . . . . . . . . . . . . . . . 42
2.3.1 Future Persectives . . . . . . . . . . . . . . . . . . . . 43
3 Muon Reconstruction and Identi�cation at CMS 46
3.1 The o�ine muon reconstruction at CMS . . . . . . . . . . . . 46
3.1.1 Reconstruction of the hits and the segments . . . . . . 47
3.1.2 Track reconstruction . . . . . . . . . . . . . . . . . . . 50
3.2 Muon identi�cation at CMS . . . . . . . . . . . . . . . . . . . 54
3.3 Muon isolation . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.4 Performance of Muon Identi�cation and Isolation Algorithms . 58
i
4 Study of a low-pT muons identi�cation algorithm with MVA
techniques 61
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 Monte Carlo simulation in CMS . . . . . . . . . . . . . . . . . 62
4.2.1 Matching generated to reconstructed particles . . . . . 64
4.3 Signal description: Ds → τ + ντ → 3µ+ ντ . . . . . . . . . 64
4.4 Study of the background composition . . . . . . . . . . . . . . 66
4.5 Performance of the standard muon ID algorithms . . . . . . . 69
4.6 Implementation of the MVA discriminator . . . . . . . . . . . 71
4.6.1 Signal and Background de�nition . . . . . . . . . . . . 74
4.6.2 Input Discriminating variables . . . . . . . . . . . . . . 76
4.6.3 Algorithms setup and con�guration . . . . . . . . . . . 79
4.6.4 Performance of the discriminators: Kolomogorov-Smirnov
test, response and ROC curve . . . . . . . . . . . . . . 82
4.7 Validation in the control region Ds −→ φπ and in the Mini-
mum Bias events . . . . . . . . . . . . . . . . . . . . . . . . . 87
Conclusion 94
Appendix A Introduction to Multivariate Analysis and statis-
tical learning 94
A.1 De�nitions and basic concepts . . . . . . . . . . . . . . . . . . 94
A.1.1 Supervised and Unsupervised learning . . . . . . . . . 95
A.1.2 Classi�cation vs Regression . . . . . . . . . . . . . . . 95
A.1.3 How do one choose a statistical learning method? . . . 96
A.1.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . 98
A.1.5 Data Preprocessing . . . . . . . . . . . . . . . . . . . . 99
A.2 Classi�cation methods . . . . . . . . . . . . . . . . . . . . . . 100
A.2.1 Linear Discriminant Analysis . . . . . . . . . . . . . . 100
A.2.2 Boosted Decision Tree . . . . . . . . . . . . . . . . . . 102
A.2.3 Multilayer Perceptron . . . . . . . . . . . . . . . . . . 106
A.2.4 BFGS . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
A.3 TMVA: Toolkit for Multivariate Analysis . . . . . . . . . . . . 112
Appendix B Performance of muons reconstruction algorithm 114
B.1 Signal and Background de�nition in training phase . . . . . . 114
B.2 Validation phase results . . . . . . . . . . . . . . . . . . . . . 117
Bibliography 122
ii
Introduction
The Standard Model (SM) is one of the greatest triumphs of the science, rep-
resenting more than a centenary of experimental discoveries and theoretical
innovations in particle physics. Its predictions have been veri�ed experimen-
tally both in the electroweak and strong sector, at the electroweak scale (102
GeV). Nevertheless, what happens beyond this scale is still unknown and
under investigation.
Various alternatives to SM involve new symmetries, forces and constituents.
The search for new physical phenomena or further con�rmation of the va-
lidity of the Standard Model is entirely dominated by the ability of each
experiment to accurately carry out di�erent measurements. The Compact
Muon Solenoid (CMS) experiment at LHC (Large Hadron Collider) has been
projected in this direction. This work is focused on the crucial role played by
the muons in the searches for new physics scenarios with CMS experiment.
Whitin CMS, muons are detected by a dedicated system, composed by detec-
tors based on di�erent gas ionization technologies, which allows to measure
muon momentum with good precision till the order of TeV1.
Very accurate and e�cient algorithms have been developed to reconstruct
and identify muons over a large momentum spectrum. The range of pT > 30
GeV is dominated by muons from W and Z decays, used as standard cadles to
check the muon reconstruction and identi�cation performance in the context
of the Higgs boson searches.
In the range of pT < 30 GeV, the most abundant source of muons is the
semileptonic decay of heavy-�avor hadrons, characterized by �nal state topolo-
gies that are similar to the ones of the muons involved in the signature of
some new physics processes, such as heavier charged lepton �avor violation
decay and supersymmetric light bosons; these muons are accompanied by
1In this thesis, Natural Units will be used: c = ~ = 1, where ~ = h/2π = 6.58211889(26) · 10−23
MeVs and c = 299792458 ms−1
1
a high rate of background muons, resulting from the decay of light-�avor
hadrons (such as π and K) and hadrons that are erroneously mis-identi�ed
as muons. The relative weight of the contribution of this background is quite
susceptible to the details of muon identi�cation algorithm. In this thesis
the capability of discriminate muon from the signal event of lepton �avor
violation, τ → 3µ, and background muons from pile-up interactions is inves-
tigated.
The CMS muon reconstruction algorithms showed an excellent performance
for medium pT (5-10 GeV), intermedium pT (10-200 GeV) and high pT (>
200 GeV) muons. The purpose of my thesis is to extend these standard
methods to the low pT region (< 3-4 GeV), for possible future applications
in new physics searches at CMS.
In this thesis a feasibility study, exploiting Machine Learning techniques,
has been carried out with the goal to discriminate signal muons from the
background ones. In particular, a multivariate discriminator has been devel-
oped, combining several input variables describing the quality of the muon
reconstruction, the energetic deposits in the calorimeters and the timing in-
formation, targeting very high e�ciency on signal muons and a few percent
e�ciency on the background.
The thesis is structured in 4 chapters.
Chapter 1 describes the theoretical framework of the Standard Model with
the basilar concepts of di�erent theories beyond the Standard Model, with
particular reference to the object of this project: low-momentum muons as
tool to investigate new physics.
Chapter 2 presents the experimental framework: the LHC and the main
components of the CMS apparatus.
Chapter 3 explains the standard muon reconstruction and identi�cation al-
gorithms used in CMS.
Chapter 4 presents in details the strategy of the analysis: in a �st step I in-
vestigated the background composition, in order to identify the background
muons; then I trained a Multilayer Perceptron algorithm which could discrim-
inate signal muons (muons from τ decay) from background ones. Finally I
validated its performance on a sample kinematically known, obtaining better
results with respect to the standard muon identi�cation algorithms.
Further details, provided in order to understand the structure and the oper-
ation of the Machine Learning algorithm used in the study, are reported in
2
Appendix A.
Finally the Appendix B collects a set of supporting results on the standard
reconstruction algorithms at CMS.
3
Chapter 1
The theoretical scenario at LHC
The physics studied at Large Hadron Collider (LHC) is the physics of the
"in�nitely small", the physics of the elementary particles. The Standard
Model (SM) of the Particle Physics is the best theoretical model so far to
describe the elementary particles and three of the four known interactions
existing among them. The SM was successfully tested from the experimental
point of view with high precision at leptons and hadrons colliders in the past
years (SppS, LEP, Tevatron) and in present days. Until the LHC era only
one piece of the theory was missing for its complete proof: the existence
of the Higgs boson, whose discovery is one of the main achievement of the
LHC and its experiments. More data have been collected to check whether
the properties of this new particle imply physics beyond the Standard Model
and if new physics exists beyond the hundrends-GeV scale.
This Chapter is dedicated to the explanation of the theoretical perspective
that drives the LHC searches and aims to fathom the limits of the SM.
1.1 The Standard Model of the Particle Physics
The Standard Model, developed by Glashow, Weinberg and Salam during
the 60's, is a renormalizable quantum �eld theory (QFT) which incorporates
quantum mechanics and relativity to describe natural phenomena at subnu-
clear scales, providing a description of the electromagnetic, strong and weak
interaction.
The basic SM paradigm is that there is a set of elementary particles consti-
tuting the matter, and its mathematical description at a fundamental level
is based on �eld concept, i.e. on wavefunctions associated with points in
4
spacetime, to which a local probability can be associated [1]. Classically,
�elds were just a mathematical abstraction: the real things were forces, due
to a �eld that propagates in the form of wave. With quantum mechanics of
the �elds, interactions were described as quantum excitations of relativistic
�elds: these excitations are particles and their interactions happen through
the exchange of other intermediary particles. So all particles are described
by relativistic �elds, but interaction �elds and matter �elds have di�erent
behaviors: matter �elds (particles) interact exchanging interaction �elds,
whose excitation is an intermediary particle. Whereas matter �elds satisfy
the Pauli exclusion principle (only one particle can occupy a given quantum
state), obey Fermi-Dirac statistics and are called fermions, there is no limit
to the number of identical and indistinguishable interaction �elds that can
occupy the same quantum state: they obey Bose-Einstein statistics and are
called bosons.
The spin of a particle and the statistics it obeys are connected by the spin-
statistics theorem, according to which fermion have half-integer spins and
bosons have integer spins. For each known particle there is an anti-particle
counterpart (antimatter �elds), with the same mass and opposite charge
quantum number.
At the present energy scale there are 12 elementary matter �elds and accord-
ing to our current knowledge they can be divided into two big families: 6
leptons and 6 quarks. Each big family in turn can be divided into 3 gen-
erations, with similar properties, but di�erent masses. This is summarized
in �gure 1.1: �rst 3 columns represent the matter �elds, while the last one
indicates the interaction �eld, called gauge bosons. The last square labels
represent the Higgs particle, whose role will be clear later.
At the current energy scale of the Universe, particles interact via 4 founda-
mental interactions. Indications that this view is related to the present-day
energy of the Universe exist: at higher energies, i.e. earlier epochs, inter-
actions would "unify"; the current theory foresees those interactions to be
remnants of one single interaction that would occur at extreme energies at
the beginning of the Universe [1]. By increasing order of strenght it is possible
distinguish:
• the gravitational interaction, acting between whatever pair of bod-
ies and dominant at macroscopic scales.
• the electromagnetic interaction, acting between pair of electrically
charged particles, or all matter �elds excluding neutrinos.
5
Figure 1.1: Presently obeserved elementary particles [1]. The quark and
lepton �avours are shown too.
• the weak interaction, acting between all matter �elds with certain
selection rules.
• the strong interaction responsible for binding the atomic nuclei; most
in detail the color force acting among quarks.
Each of these interactions, according to quantum mechanical view, manifest
itself through the exchange of intermediate particles, called quanta of the
�eld force: the quantum of electromagnetic force is the photon γ; the weak
interaction is mediated by 3 bosons, one neutral Z and two charged W±;
�nally the quanta of strong force are 8 gauge bosons, called gluons.
The coupling of each particle to the boson(s) associated to a given interac-
tion are determined by the strenght of the interaction and by a number called
"charges". The gravitational charge of a particle is proportional to its mass;
the weak charge is the weak isospin charge (± 1/2 for fermions and 0, ±1 for
6
bosons); the electrical charge is the positive or negative charge; the strong
charge is called color, distinguishable into in 3 types: red, green and blue.
The Standard Model is built on the concept of local gauge invariance
of its Lagrangian. What does it means?
The concept of invariance is streactly connected to those of symmetry: if a
system is said for example invariant with respect to some transformations in
the space, it is possible say that the laws of physics are the same everywhere
and then they are symmetric. So more generally, a simmetry is an invariance
over a transformation or a group of transformations [1].
The dynamical description of a particle system in the quantum world can be
expressed by the so-called Lagrange function L = K - V, where K is the kine-
matic energy and V the potential one. In fact the equations of motion, called
Euler-Lagrange equations, have the form: ddt
(∂L∂qi
)− ∂L
∂qi= 0, with qi coor-
dinates and L Lagrange density function1. Noether's theorem demostrates
that simmetries of this function with respect to given operations entail con-
servation laws and therefore conserved quantity, that are observed in natural
phenomena.
Therefore the Lagrangian of SM is invariant under gauge transformations in
the sense that it is a gauge simmetry, where a gauge transformation is de�ned
in this way:
Aµ −→ Aµ + δθ
with Aµ a generic �eld and θ an arbitrary function. If θ depends on the
space-time coordinates x = (ct, x, y, z), the transformation is called local,
otherwise global [2]. The invariance of the Lagrangian under global gauge
tranformation is related to global conservation of the system energy, while
the local one is related to the local conservation of the energy: this last one
is what ensures the energy conservation in the evolution of a system from
the initial state to �nal state.
In a theory of the interactions as SM is, the motion equations (that have the
form of Schroedinger, Klein-Gordon and Dirac equations) are modi�ed to
incorporate explicitly the coupling with the interaction �elds. The introduc-
tion of these new terms makes equations invariant under a combined local
gauge transformation of matter and interactions �elds. Conversely requiring
that the matter quantum equations should be invariant with respect to local
1The Lagrange function L is usually writed as the integral over the spatial coordinates
x = (x, y, z) of the Lagrangian density as L(t) =∫d3xL(x)
7
gauge transformation within some internal symmetry group implies the ex-
istence of well-de�ned interaction �elds, then called gauge �elds.
For completeness, the gauge transformation of the wave function associated
to the �eld Aµ can be written in more general form as:
ψ(x)→ eiθAψ(x) (1.1)
where A is a generic unitary operator, eiθA indicates the phase term and ψ
is the matter �eld.
In more detail, the mathematical formulation of the SM is based on the local
gauge invariance of its Lagrangian under the SU(3)C× SU(2)L× U(1)Ygroup: the invariance under SU(3) provide a way to describe the color inter-
action, the Quantum Chromodinamics (QCD); the symmetry group SU(2)
Ö U(1) describe instead the weak and electromagnetic interactions, the so-
called electoweak interaction.
Finally, saying that it is a renormalizable theory means that not divergent
terms must be present in its formulation: as it will be explained later these
in�nite are absorbed, introducing a dependency on energy of the coupling
constants (one for each interaction).
1.2 QCD: Quantum Chromodinamics
The strong interaction is modeled by QCD, Quantum Chromodinamics, a
theory exploiting the invariance of the strong interaction with respect to a
rotation of 3 di�erent element in "color space". The minimal representation
for such a symmetry is a non-Abelian gauge group SU(3)C .
In general, SU(3) denotes the group of special unitary 3-dimensional matri-
ces which represent space-time dependent rotations in complex plane, where
"special" means that the determinant of the matrices is equal to 1. Since
SU(n) group has n2 - 1 free parameters, and therefore n2 - 1 generators,
then this group has 8 generators (λa
2Gell-Mann matrices) that represent a
color exchange and obey the following commutation rules:[λa2, λb
2
]= ifabc
λc2,
where fabc are the structure constants of the group and characterize the �-
nite transformations in a suitable neighborhood of the unit transformation
[2]. This means that generators do not commute and from here the non-
Abelian characteristic of the group. This mathematical structure derives
from the conservation of the colour charge.
8
The equation 1.1 takes the form:
ψ(x) −→ eigsλa
2θaψ(x)
and the gauge �elds needed to assure the invariance of the wave equations
under these transformations are gluons.
Quarks and antiquarks appear in the fundamental 3 and 3 representation
of the gauge group SU(3)C , the so-called �avour-SU(3) group (theory of the
eightfold way), where 3 represents the colour number and 3 the anti-colour
one: this is why it talks about "color triplet" [2]. These particles constitute
the so-called hadrons, or colour singlet that can be obtained by bending in
a single particle all three colours, or all three anticolours, or a colour charge
and the equivalent anticolour charge.
The only stable hadronic states are neutral color, so free objects can't be col-
ored and it is not possible to observe free quark (�con�nement� phenomenon).
Hadrons can therefore be of two types: (anti)baryons, consisting of three
(anti)quarks of a di�erent (anti)colour and mesons, consisting of a quark of
a certain colour and an antiquark which carries the corresponding anticolour.
Gluons are in the so-called adjoint representation and then their number is
the same (8) of the number of the generators of SU(3)C group: in this case it
talks about "color octet". They are bi-colored particles because they carry
color charges and can interact each other.
The dynamic structure of the group is expressed by the Lagrange den-
sity for SU(3):
LQCD = ψ (iγµ∂µ −m)ψ − gψ(x)γµλa
2Gaµψ −
1
4GaµνGa
µν (1.2)
where the �rst addend is the Dirac free Lagrange density for a Fermion
�eld ψ of spin 1/2 at space-time position x and mass m, with γµ Dirac
matrix: it represents the free quark propagation; the last term represent the
kinetic tensorial term for gauge bosons; middle one represent the term of
the interactions between fermion �elds and gluons: Gaµ are the vectorial �eld
(gluons), with a from 1 to 8, and g represent the strength of interaction, or
better gs to indicate the strong coupling constant, often parametrized as:
αs =g2s4π
(1.3)
It is not really constant, but it runs with energy (running coupling constant).
9
The running strong coupling constant
In the quantum physics, the vacuum is not really empty! It is the state of
minimum energy and consists of pairs of virtual particles. From the elec-
tromagnetic point of view, a point-like charge polarizes the vacuum, creat-
ing electron-positron pairs which orient themselves as dipoles, screening the
charge itself. The force carriers of the strong force (gluons) have color charges
themselves and increase the number of force carrying particles: their screen-
ing e�ect is opposite to the quark one (see �gure 1.2).
Figure 1.2: Vacuum polarization with coloured particles [3].
Among two e�ects, gluons screening dominates, thus the coupling constant
becomes larger at lower energies and it becomes in�nitely strong eventually
(color con�nement). This behavior is shown in �gure 1.3, together with the
experimental measurements at di�erent energy scales.
The dependency of αs from the energy (momentum transfer) has the form:
αS(Q2)
=αS (µ2)
1 + αS(µ2)12π
(11nc − 2nf ) lg(Q2
µ2
) (1.4)
where nf indicates the �avours number and nc the colors number (equal to
3), and µ is a scale parameter that must be determined experimentally. The
decreasing is logarithmically, so the strenght of coupling remain important
even for very large energy (asymptotic freedom).
Thanks to the results from the Tevatron and from the LHC, the energy scales
at which αs is determined now extend up to more than 1 TeV [4].
10
Figure 1.3: Summary of measurements of αs as a function of the
momentum transfer Q, at several perturbative orders [4].
1.3 The electroweak interaction
The electromagnetic and weak interaction appear to be very di�erent at low
energies and are modelled using two di�erent gauge theories. However at
energy of the order of 100 GeV, they can be considered two manifestations of
a same force: in 1968 in fact Glashow, Salam and Weinberg uni�ed the two
forces, leading to the so-called ElectroWeak (EWK) theory. The experimen-
tal proof of the validity of this theory came in 1983 with the discovery of the
W± and Z0 bosons by the UA1 and UA2 collaborations in proton-antiproton
collisions [2]. From the formal point of view, the uni�cation is accomplished
under a SU(2)L × U(1)Y gauge group.
In general, SU(2) denotes the group of special unitary 2-dimensional matrices
and is the group of the spin rotations; it has 3 free parameters and 3 gener-
ators, that are the Pauli matrices, and it is connected to the conservation of
the �weak isospin� quantity, T.
If in the equation 1.1, A is chosen to be one of the generators of the SU(2)
group, then the associated gauge transformation corresponds to a local ro-
tation in a spinor space. The invariance of the wave equations under these
gauge local transformation leads to the need of 3 gauge �eld: W± and Z.
SU(2)L in particular refers to left-handed fermion �elds or equivalently right-
handed anti-fermion �elds that partecipate to weak interactions, because here
the parity (inversion of the space axes) isn't a symmetry of the group.
11
U(1) is the group of the unitary 1-dimensional matrices and has 1 free pa-
rameters and 1 generator, the "weak hypercharge" connected to the electrical
charge by the Gell-Mann-Nishijma relation: Q = Tz+Y2, where Tz is the third
component of the weak isospin. It is represented by space-time dependent
rotation in a complex plane, so that the multiplication of the state equa-
tion of a particle by a member of this group produces a phase change. The
invariance under phase changes allows the formulation of the theory to be
possible independently of the choice of phase. This invariance leads to the
conservation of the weak hypercharge.
Starting to call W1, W2 and W0 the three gauge �elds of SU(2) and so
W aµν (with a = 1, 2, 3) the �eld tensors of this group, and labeling Bµν the
�eld tensor of U(1), it is possible write the EWT Lagrange density as:
LEW =∑
families
ψ(iγµ∂µ)ψ−ψ(gW a
µ
τa
2+ g′BµY
)ψ− 1
4W aµνW
aµν− 1
4BµνB
µν
(1.5)
where τa are the Pauli matrices and Y the hypercharge.
The �rst term indicates the Dirac free Lagrangian for a massless Fermion
�eld; the last two are the kinetic parts related to the gauge �eld; the middle
terms represent the interaction part between gauge �elds and fermion �elds,
where g is the strenght of the SU(2) coupling and g′ the strenght of the
hypercharge coupling, related each other by the relation:
g sin θW = g′ cos θW = e
where e is the electric charge and θW the Weinberg angle, which indicates
the electroweak mixing between the gauge �elds W0 and B0 that give rise to
the physically observed neutral �elds Z0 and photon.
At this point, the theory of electromagnetic and weak interactions, as de-
scribed, is unsatisfactory due to the fact that it contains four massless gauge
bosons, W a and B, while experimentally only the photon is massless and the
others are observed with a mass of the order of 100 GeV. But while with
QCD fermion masses can be added "by hand" without any simmetry viola-
tion, here an explicit mass term (quadratic term in the Lagrangian) breaks
gauge SU(2) symmetry; moreover weak carriers must be massive to explain
the weakness and short range of the interaction. Is there a way to generate
12
the gauge boson and the fermion masses without violating gauge invariance?
The answer is in the so-called Higgs mechanism.
1.4 The Higgs mechanism
R. Brout, F. Englert e P. Higgs in the 1960s formulated an elegant theory that
introduced the masses of particles, requiring the presence of a new massive
scalar particle, called the Higgs boson. The mechanism with which a new
particle appears is commonly called the Higgs mechanism and it is related
to the spontaneous symmetry breaking (SSB): the system reaches a state of
minimum energy (vacuum state) in which part of the symmetry is hidden
from the spectrum. As a consequence, the gauge bosons became massive and
appear as physical states. This is possible by introducing the following gauge
invariant potential in the theory:
V (ϕ) = −µ2ϕ+ϕ+ λ(ϕ+ϕ
)2(1.6)
where µ2 and λ are both positive constants and ϕ is a scalar �eld, de�ened
as ϕ(x) =
(ϕ1(x)
ϕ2(x)
), with ϕ1(x) and ϕ2(x) are complex scalar �elds.
If µ2 < 0 then in the ground state the particle is situated in asymmetrical
positions of minimum: neither of these equilibrium positions, when taken
alone, shows the symmetry of the potential under SU(2) transformation; so
the symmetry is spontaneous broken. In other words, once an equilibrium
position is chosen, the symmetry of the potential becames hidden [2].
Expanding the potential around the minimum points, the problems with the
mass are solved because additional terms, representing the interaction be-
tween Higgs boson and gauge bosons and fermions, appear in the Lagrangian;
they are the so-called Yukawa terms. Computing the vacuum expectation
value of these terms, that in quantum mechanics corresponds to the ground
state of the system, it possible obtain the particle masses:
m2W =
%20e2
4 sin2 θW
m2Z =
%20e2
4 sin2 θW cos2 θW
mf = cf%0√
2
(1.7)
13
where ρ0 = µ2/λ is the vaccum expectation value of the Higgs �eld; cf is
an arbitrary phase, that without losing generality is required positive and
represents the phase transition below a certain energy, forseen by SSB; the
subscript f indicates fermion type.
By the interaction of the Higgs boson with itself appears also its mass term:
m2H = 2λρ20
and because its dependency on the λ parameter, its value is not predicted by
the theory and corresponds to the Higgs �eld self-coupling.
The Standard Model Lagrangian �nally can be obtained adding all men-
tioned contributions (Yukawa terms, kinetic terms and mass terms) and its
free parameters are in total 5: g, g′, ce, µ and λ or equivalently e, sinθW ,
me, m2W and m2
H , experimentally measured.
1.5 Standard Model re-discovery at CMS
Since main tool of this thesis for the investigation of the SM is the CMS
experiment at LHC (see Section 2.1 to mention its contribution to the direct
search of SM signals), it is rightful, for completeness, to report its direct
research contribution. The �gure 1.4 summarizes the "state of art" of its
researches.
The CMS program was extensive, with a wide range from the electroweak
sector (single W, Z production cross section and associated production) to
the strong sector (top quark production), up to the most competitive chal-
lenge: the Higgs boson existence.
ATLAS and CMS experiments announced the observation of 125 GeV Higgs
Boson in July 2012. The discovery has been con�rmed with the data collected
in the Run 2. Figure 1.5 shows the 4 leptons (muons, electrons) invariant
mass spectrum, obtained in H → ZZ search, with the data collected bt CMS
during the Run 2.
The �rst direct probe of fermionic couplings was the τ particle decay, which
was observed in the CMS results performed at the end of Run 1 [7]. During
Run 2, the energy increase of the center of mass and the broader data set
made it possible to probe other channels. Over the past year, proof of Higgs
decay in bottom quarks has been obtained and the production of the Higgs
14
Figure 1.4: Summary of measures of SM cross sections in Run 1 and Run 2
[5].
Figure 1.5: Mass spectrum of 4 leptons in the full mass range. The 3 peaks
are, in order of increasing mass, the decay of the Z, the Higgs boson
decaying into Z(l+l−)Z∗(l'+l'−) and di-boson production of two Z(l+l−).
Points with error bars represent the data and stacked histograms represent
expected signal and background distributions [6].
boson together with top quarks has been observed [8]. The most accurate
ways to summarize what we currently know about the interaction of the
15
Higgs boson with the other particles of the SM is to compare its interaction
strength with the mass of each particle, as shown in �gure 1.6.
Figure 1.6: Higgs boson coupling strength to each particle (error bars) as a
function of particle mass compared with SM prediction (blue dotted line)[9].
This clearly shows that the Higgs interaction force has an a�nity with the
mass of the particles: the heavier the particle, the stronger its interaction
with the Higgs �eld. This is one of the main predictions of the Higgs mech-
anism in the Standard Model. These couplings are in excellent agreement
with the SM prediction over a range covering 3 orders of magnitude in mass.
Deviations from these predictions could be a hallmark of new physics.
Recent highlights in the CMS research plan included the �rst observation
of electroweak production of same-charge W boson pairs [10], �rst evidence
for production of a top quark with a photon [11] and detailed investigations
of single top quark and top quark pair production as a function of the event
characteristics, that can be used to measure SM parameters, such as the top
quark mass and the strong coupling constant. The study is work in progress.
Most of the already published CMS results from Run 2 are based on data
recorded in 2015 or 2016, while the total number of proton-proton collisions
16
accumulated in Run 2 is more than three times higher. The large new dataset
will be used to expand the direct searches for new physics, in particular for
rare events with unusual signatures such as long-lived, heavy particles (see
next section to more details).
1.6 Open questions and Physics beyond SM (BSM)
The primary way to uncover physics (and new physics) is to explore the
high-energy frontier at colliders, where heavy new particles can be directly
produced and studied. The physics program at LHC in fact focuses on an-
swering fundamental questions in particle physics. What is the origin of the
masses of elementary particles? What is the nature of the dark matter we ob-
serve in the Universe? Are the fundamental forces uni�ed? Do the properties
of matter and antimatter di�er?
The hierarchy problem
The �rst of these questions seems to have found an explanation with the dis-
covery of the Higgs boson of mass ∼125 GeV, but at the same time anotherquestion rises, regarding the so-called hierarchy problem.
The SM introduces particle masses through SSB caused by the Higgs �eld;
within the theory, these masses are di�erent to the foundamental values (or
bare masses, the masses at the Planck energy scale) and quantum correc-
tions due to the presence of virtual particles are introduced to correct this
di�erencies: this prescription is known as renormalization. In the case of the
Higgs boson, these corrections are much larger than the current experimen-
tal mass and consequently the parameter of the bare mass of the Higgs must
be "�ne-tuned" in order to cancel such quantum corrections. However the
�ne-tuning level required in the SM is considered unnatural. Therefore new
physics is expected to be found at electroweak scales, about 102 GeV.
The uni�cation issue: SUSY and GUT
New physics must appear to mass not too far from 1 TeV to eliminate the
mass growth, mentioned in the previous section. This energy scale is that of
a powerful model of SM extension, called SUperSYmmetry (SUSY), which
predicts that every ordinary fundamental particle has a partner particle: a
bosonic partner to each fermion and vice versa. These SUSY partners take
17
an important balance in all the description of nature and solve the problem
of the particle hierarchy. In SUSY the Higgs sector is modi�ed so that the
properties of the Higgs boson can deviate from those expected in SM and
additional Higgs bosons are predicted.
The 1 TeV energy scale is accessible by LHC, but to date, no SUSY particles
have been observed and con�dence limits have been set on the production
cross section and mass.[12]. The search for SUSY below the TeV is one of
the physics goals of the next LHC runs.
Present day experiments do not perform test only to verify that the prop-
erties of the Higgs boson are in line with those provided by the SM, but
speci�cally look for properties that provide evidence of new physics. For
example, by constraining the branching ratio with which the Higgs boson
decays in combinations of invisible or unobserved particles, stringent limits
are given on the existence of new particles with masses lower than that of the
Higgs boson. So far, none of this research has found anything unexpected
[9]: the challenge is still ongoing.
Another theory proposed to extend the Standard Model is the Grand Uni�-
cation Theory (GUT).
The SM is constructed on the basis of a gauge group which introduces three
distinct coupling constants that have to be measured experimentally and can
change with energy according to a renormalizable theory as SM is, as seen
in Section 1.2. Thus a real uni�cation is avoided (see �gure 1.7 left).
The GUT foresees this uni�cation between EW and strong forces within more
complex gauge groups, SU(5), at scales of energy ΛGUT ∼1015-1016 GeV (�g-
ure 1.7 right) which are close to the scale of quantum gravity, ΛPlanck ∼1018GeV [1], where gravitational e�ects became important.
Regarding the above remaining questions, the SM does not provide answers.
It is a successful theory and so far it provided accurate predictions which
have been veri�ed experimentally over the last half century, at the elec-
troweak scale. Nevertheless, what happen beyond this scale is missing.
All these unanswered questions suppose that it could not be considered the
exhaustive theory of particle physics, but only a "low energy" approximation
to a more fundamental theory.
18
Figure 1.7: Evolution of the SU(3)ÖSU(2)ÖU(1) gauge couplings to high
energy scales, with renormalization group equations of the SM (left) and
the renormalization group of the SUSY generalization of the SM (right) [1].
Dark Matter
Cosmological and astrophysical observations show that SM describes only
15% of the total matter present in the universe, the remaining 85% is Dark
Matter (DM) [12]. Several candidates exist for the dark matter.
The favourite one is the Weakly Interacting Massive Particle (WIMP), an
electrically neutral, colorless, stable particle with a mass in the range of the
electroweak scale [13].
The DM-SM interaction can be probed in several complementary and in-
terdisciplinary ways: Direct Detection (DD) experiments look for evidence
of DM-nucleus elastic scattering; Indirect Detection (ID) experiments search
for SM particles from DM annihilation or decay and Detection at Collider.
The experiments at LHC in fact rely on detecting dark-sector particles pro-
duced in pp collisions, exploiting essentially the �EmissT + X� or �Mono-X�
signature, where X stands for SM particles that tag the event and EmissT is
the missing energy2 associated to the SM particle.
2In the presence of weakly interacting particles, such as neutrinos, which could then
escape the detection, the conservation of the total transverse momentum, requirement of
frontal collisions, could not be assured. The missing transverse energy (MET or 6 ET ) is
therefore introduced and de�ned as√(−∑pix)
2+(−∑piy)2, where the sum is extended
to all the particles detected in the collision and in the c = ~ = 1 frame energy and
momentum have the same dimension.
From an operational point of view, MET is evaluated as the vector sum in the transverse
19
There have been numerous searches for DM as WIMPs, at the LHC and else-
where, but no evidence for WIMPs has emerged [13]. This motivates looking
beyond the WIMP paradigm.
In general, dark sectors can feature new particles and forces with signatures
not found in the WIMP scenario: one such new particle is a �dark photon�.
A compelling dark-force scenario in fact involves a massive dark photon, A′,
whose coupling to the electromagnetic current is suppressed relative to that
of the ordinary photon, γ, by a factor of ε. Theoretically the dark photon
does not couple directly to charged SM particles; however, a coupling may
arise via "kinetic mixing" between the SM hypercharge and A′ �eld strength
tensors. This mixing provides a potential "portal" through which dark pho-
tons may be produced if kinematically allowed [14].
Another candidate of DM is the neutralino, the lightest SUSY particle, a
mixture of the superpartners of Higgs and electroweak gauge bosons. Par-
ticle physics considerations alone require the neutralino to be electrically
neutral, e�ectively stable and weakly interacting, with mass of order 100
GeV. Remarkably, these properties are consistent with the possibility that
the thermal relic density of neutralinos makes up most of the missing mass
of the universe.
No evidence for DM has been observed so far but there is much more phase
space to be explored [13].
Matter-antimatter asymmetry
Another fundamental physical phenomenon not explained by the SM is re-
lated to the matter and antimatter abundances in the universe. SM predicts
that should have been created in almost equal amounts, but the Universe
seems mostly made out of matter, practically without anti-matter. This im-
plies a particle-antiparticle asymmetry and suggests that CP (charge and
parity transformations) may not be a symmetry for all interactions. Despite
the phenomenological success that describes the charge conjugation parity
symmetry violation (CPV) [15][16] [17] in the SM, according also to exper-
imental results, one talks about little violation, not enough to explain the
observable asymmetry! Probably there must be additional sources of CPV
besides those currently known.
The recent evidences from solar and atmosheric anomalies, that prove neutri-
nos have �nite masses, suggest another possible source of CPV in the lepton
plane of the energy deposits detected by calorimeters.
20
sector, but the quantity remains low. Superkamiokande experiment [18] in
Japan in fact has observed the oscillations of νµ ↔ ντ , that can happen
only if the neutrinos of di�erent �avors have di�erent masses. However, in
the current SM formulation, the neutrino masses and the lepton mixing an-
gles (described by the so-called Pontecorvo�Maki�Nakagawa�Sakata -PMNS-
matrix) cannot be included as parameters. Actually this is an experimental
result that cannot be explained by the SM.
Neutrinos oscillations open to lepton �avor violation (LFV) channels and
involve searches for new physics in rare or highly suppressed �avor chang-
ing neutral current reactions. Generally, "family" or �avor number is not
a symmetry of Lagrangian like charge: quark family number is violated
in weak decays and this phenomenon is described by a 3x3 matrix called
Cabibbo-Kobayashi-Maskawa (CKM) matrix. Morover, as just mentioned,
it's violated in neutral leptons. Nevertheless, it has never been observed in
the charged lepton sector. Most "natural" new physics models predict one
should have seen it already, even if small. So the rates for Charge LFV
(CLFV) processes are expected to provide information regarding the nature
of new physics.
1.6.1 Low momentum muons as a tool to probe New
Physics
Very soft muons, with pT < 3-4 GeV, can considered as a an interesting
probe to test the presence of new physics. In this paragraph I will discuss
two physics cases that triggered the interest of the scienti�c community,
where low momentum muons play a crucial role.
o Charged Lepton Flavor Violation τ −→ 3µ
The leptonic �avour conservation is so far a phenomenon experimentally ob-
served without a respective symmetry in the SM. There are no known sym-
metries that strictly forbid lepton-�avor violating decays, such as ` → `′γ
or ` → 3`′. In the SM, due to neutrino oscillations, such decays are possi-
ble, albeit with extraordinarily small branching fractions. At the LHC, the
τ → 3µ decay is one of the �cleanest� LFV decay channel. The currently
best experimental upper limit, set by Belle [19], is B(τ → 3µ) < 2.1 x 10−8
at 90% CL and BaBar [20] =⇒ BR(τ → 3µ) = 3.3 x 10−8 [21].
The main motivations that focuses our attention on this τ dacay type is that
21
the τ → 3µ decay has the advantage of a very clean �nal state topology; its
BR is kinematically advantaged and seems enahnced in some BSM models.
Morover it can be studied at pp colliders, indeed LHC is a factory of τ lep-
tons: they are here produced in abundance O(1011)/fb−1, as shown in table
1.1.
Table 1.1: The expected inclusive number of τ leptons produced in D and B
meson decays at LHC. Numbers are from PYTHIA [22].
Process number of tau leptons (33fb−1)
pp→ cc+ . . .
D→ τv 4.0× 1012 (95%Ds, 5%D±)
pp→ bb + . . .
B→ τv + . . .
B→ D(τv) + . . .
1.5× 1012(44%B±, 45%B0, 11%B0S, 0%B±C)
6.3× 1011(98%Ds, 2%D±
)The dominant sources at LHC are the Ds and various B mesons; the W and
the Z production sources will provide considerably less (8 x 108) number per
year, but more energetic.
Search for τ → 3µ at colliders has carried out and upper limit have been
set: LHCb (Run 1) =⇒ BR(τ → 3µ) < 4.6 x 10−8 and ATLAS (Run 1) =⇒BR(τ → 3µ) < 3.8 x 10−8 at 90% CL.
Until the beginning of this year, the great absentee in this panorama was
CMS. In March 2019 its �rst public result was released: with 2016 data
(Run 2 at 33 fb−1 integrated luminosity) it was obtained BR(τ → 3µ) < 8.9
x 10−8 at 90% CL [22], using the heavy �avour channel (B and D decays).
The pT spectrum of muons coming from τ decay in the HF channel is partic-
ularly soft (as will see in the Chapter 4) and bosted in the forward direction.
The situation for τ → 3µ, although more promising, is still very challenging.
The present work has been developped in this context. I studied the back-
ground composition and I provided an approach to discriminate these events
from signal events, taking advantage of the most recent multivariate analysis
techniques. The study is presented in the Chapter 4 of this Thesis.
o Muons as a tool to test SUSY and Dark sector
Recently CMS, ATLAS [23] and LHCb [14] experiment published results
about the search of new light bosons with the mass in the range 0.25 - 8.5
22
GeV/c2 [23], whose presence could be interpreted as a dark matter candidate
(dark photon, γD) or a �rst hint of the lightest scalar foreseen by the Higgs
sector of the some extensions of the SUSY models. The search looked for
the decay of the new boson into 4 muons, characterized by a very soft pTspectrum.
Also in this context, the detection and identi�cation of soft muons, plays a
crucial role and the muon identi�cation developed in the present work could
be and additional tool for this kind of search.
This search has been performed on the data collected by the di�erent ex-
periments in 2016. So far no excess is observed in the data, and a model-
independent upper limit on the product of the cross section, branching frac-
tion and acceptance is derived [23].
Nevertheless, this searches are still ongoing using the 2017/18 datasets and
are one of the physics goals of the next LHC runs.
23
Chapter 2
The CMS experiment at LHC
The Large Hadron Collider (LHC) [24] located in the world's largest particle
physics laboratory, European Organization for Nuclear Research (CERN) is
the most powerful particle accelerator built to date. Many scienti�c success
have been achieved after the start up, but many open challenges still remain.
The Compact Muon Solenoid (CMS) is one of the main experiments at LHC,
designed primarily for the search of new physics signature in proton-proton
collision (pp).
This chapter is dedicated to the description of the CMS detector and its
operational strategy, with greater emphasis on the muons reconstruction and
identi�cation techniques, main topics of my thesis..
2.1 The Large Hadron Collider
The LHC is an hadron circular collider, designed with the main goal of op-
erating at TeV energy scale, corresponding to the energetic scale of the elec-
troweak symmetry breaking and of the Higgs mechanism.
2.1.1 Technical characteristics
The LHC was built in the tunnel that hosted the LEP electron-positron
accelerator: a 27 km long ring-shaped tunnel at a depth of about 100 m
underground. Two high-energy proton beams travel in opposite directions at
speeds close to that of light before colliding, inside two tubes, called beam
pipes in which the ultra-vacuum is applied.
The protons in the beams are distributed in "packets" or bunches, about 5 cm
24
long and transverse dimensions of 10 µm and separated by 25 ns time steps,
corresponding to 750 cm in spatial separation. The collision of two packets
is called bunch crossing (BX) and takes place at a rate of 40 MHz [10]. The
beams are focused using a total of 1232 dipoles and 392 quadrupoles built
with superconducting magnets cooled with liquid helium at a temperature T
= -271.3 ° C (1.9 K), colder than that of the open space (2.7 K).
The proton beams reach the LHC ring after a series of pre-accelerators car-
rying the protons to a 450 GeV energy in three steps: the LINAC (LIN(ear)
AC(celerator)) brings them to an energy of 50 MeV; the Proton Synchrotron
(PS) further accelerates them to an energy of 26 GeV and �nally the SPS
(Super Proton Synchrotron) injects them in LHC with energy of 450 GeV[25].
The two proton beams then collide at four points along the ring, correspond-
ing to the positions of the four main LHC detectors/experiments: ATLAS (A
Toroidal Lhc ApparatuS ), CMS (Compact Muon Solenoid), ALICE (A Large
Ion Collider Experiment) and LHCb (Large Hadron Collider beauty). Next
to these there are 3 other minor experiments for a total of 7 experiments.
In the �gure 2.1 a complete diagram of the beam injection system is shown.
Figure 2.1: Scheme of LHC and its experiments.
The number N of events per unit of time produced in pp collisions is ex-
pressed as: N = σ · L, where σ represents the cross section of the particular
process investigated and L the instantaneous luminosity (number of collisions
provided per unit of time). Assuming two proton beams with n bunches and
25
with respectively N1 and N2 protons per bunch, colliding at a frequency f,
the luminosity at LHC is de�ned as:
L =f · n ·N1 ·N2
4π · σx · σy
where σx and σy denote the beam pro�les on the transverse plane of the
bunches.
In order to achieve the project instantaneous luminosity of L = 1034cm−2s−1
each beam must be composed of 2808 bunches with 1.15 Ö 1011 protons per
bunch, separated from each other by fBX = 40 MHz.
2.1.2 LHC between past and future
The LHC was designed to accelerate proton beams up to 7 TeV, which would
have resulted in a centre-of-mass energy available for collisions of 14 TeV at a
maximal instantaneous luminosity of 1034cm−2s−1. The required design has
driven the choice of a proton-proton collider, instead of a proton-antiproton:
even if a p-p machine has the advantage that both beams can be kept in
the same beam pipe, the production of the number of antiprotons needed to
reach the desired luminosity is an unfeasible task.
LHC started operating in 2009. After an activation phase at 0.9 and 2.36
TeV, the �rst real data collection periods took place in 2010-2011 with pp
beams at a Ecm = 7 TeV and continued in 2012 at a Ecm = 8 TeV.
This con�guration, which is labeled "Run 1", lasted until 2012, the year in
which it was necessary to suspend operations (Long Shutdown 1 o LS1), in
order to prepare for a second phase: the "Run 2" in fact labels a period
of activity 2015/2018, following a series of improvements with the aim of
increasing the center of mass energy to 13 TeV, with an instantaneous lumi-
nosity of 8 x 1033cm−2s−1, about 40 times greater than the previous period,
and then bringing it to almost 2 x 1034cm−2s−1 (see the �gure 2.2).
In December 2018 another long interruption of operations began (LS2), to
prepare LHC to operate at an instantaneous luminosity further increased to
2 x 1034cm−2s−1. In this way a third operational phase called "Run 3" will
open and will cover the period between 2021-2023, following which a further
increase in luminosity is expected (3000 fb−1, see �gure 2.3) in what will be
called High Luminosity LHC or HL-LHC, which should start in 2026. This
will be possible thanks to a series of modi�cations regarding principally the
replacing of the linear accelerator Linac 2 with the new Linac 4 and more
26
(a)
(b)
Figure 2.2: (a) Evolution of the peak luminosity at LHC between
2010-2018. (b) Total integrated luminosity during the year 2010-2018 at
nominal center-of-mass energy [26].
than 20 magnets with niobium-tin (Nb3Sn) coils: a very fragile material
but able to withstand higher magnetic �elds than the niobium-titanium wire
(Nb-Ti) used in the LHC's magnets.
One of the main challanges for the LHC experiments, related to the change
of the operation condition in Run 3, will be dealing with the increased pile-up
(weak pp interactions with low transverse momentum particles, see Section
2.3) and background and the possible deterioration of the detectors, caused
by the very high level of radiation.
27
Figure 2.3: Instantneous and integrated luminosity designed at LHC up to
2037 [27].
2.2 The CMS Experiment
The Compact Muon Solenoid experiment (CMS) is a general-purpose detec-
tor (built for the investigation of a wide �eld of physics), with a cylindrical
geometry and forward-backward symmetry along the beam line. It covers
an extensive research program within the boundaries of the Standard Model
(including the search for the Higgs boson(s) in all decay channels, etc...) and
beyond (supersymmetry, extra dimensions and dark matter). It shares the
same objectives as the ATLAS experiment, but di�ers from it in many details,
thus avoiding systematic errors if only one single technique was used for both.
Coordinate system: conventionally CMS uses a right-handed coordinate
system, with the origin at the nominal point of the collision: the x-axis point-
ing to the center of the LHC ring, the y-axis pointing upward, perpendicular
to the LHC plane, and the z axis along the counterclockwise direction of the
beam, toward the Jura mountains from LHC Point 5. The azimuthal angle
φ is measured from the positive x axis in the x-y plane, while the polar angle
θ from the z axis in the y-z plane.
The interaction is commonly described by Lorentz invariant quantities under
longitudinal boosts, then the energy and the momentum of the outcoming
particles in the transvere plane to the beam direction are dominant and are
denoted by ET and pT . For the same reason, a commonly used spatial co-
28
ordinate is the pseudorapidity η = −ln[tg(θ/2)], a good approximation of
the rapidity y, an invariant quantity for Lorentz transformations along the
axis of beams, more useful in the hadron collider studies [25] and de�ned as
y = 12
ln(E+pzE−pz
), where E is the particle's momentum and pz is the compo-
nent of its momentum projected along the beam axis.
2.2.1 The detector
The CMS detector is shaped like a cylindrical onion, 21.6 m long and 14.6
m in diameter, with several concentric layers of detectors, built around the
beam pipe. Each detector is designed to detect a certain type of particle,
measure the momentum or the energy.
Figure 2.4: Perspective representation of the CMS detector, together with
all its components.
As shown in �gure 2.4, starting from the outermost to the heart of the detec-
tor, the di�erent detection systems are: muon chambers (dedicated to muon
detection), hadron calorimeter HCAL (designed to measure the energy of the
hadrons by stopping them, it surround the collision and prevent the particles
from "run away"), electromagnetic calorimeter ECAL (measures the energy
29
of electrons and photons) and �nally central tracker (to provide accurate
momentum measurements).
However the fulcrum of the whole apparatus is represented by a huge solenoid
magnet, which gives its name to all the detector: a cylindrical coil of super-
conducting wire (cooled to -268.5 °C), generating a 3.8 T solenoid magnetic
�eld oriented along the beam line (CMS was designed to produce a magnetic
�eld of up to 4 T in the inner region. In order to maximize its lifetime, the
magnet is operated at 3.8 T). The high nominal value of the �eld is a nec-
essary prerogative for the determination of the sign of the high momentum
particles. In fact, given the considerable dimensions of the solenoid, 13 m
long and 7 m in diameter, it is possible to place the innermost detection sys-
tem (calorimeters + tracker) inside it, to be able to exploit the high magnetic
�eld for accurate measurement of momentum also for higher-impulse parti-
cles, such as muons. Since greater the momentum of the particle smaller the
curvature in the magnetic �eld that the track undergoes, outside the solenoid,
where the muon chambers are located, the magnetic �eld is con�ned from the
iron return yoke and still allows to track the muon trajectories. This mag-
netic structure allows measuring the momentum inside the solenoid (tracking
devices) and outside it (muon chambers), exploiting the well known relation
p[TeV ] = 0.3 ·B[Tesla] ·R[km], with R being the radius of the curved track.
Inner tracking system
The tracker, called inner tracker, occupies the innermost position of the en-
tire detector and covers a range of |η| <2.5 [28], as shown in �gure 2.5.
Its main task is to accurately and e�ciently measure the trajectories of the
charged particles produced in pp collisions and to reconstruct the secondary
vertices and impact parameters for the identi�cation of heavy hadrons in an
environment with high radiation levels. These requirements are met thanks
to a high granularity and a fast response of their sub-detector (silicon de-
tectors), keeping the quantity of material as low as possible so as to limit
phenomena such as multiple scattering, bremsstrahlung and nuclear interac-
tions.
It consists of a heart of 65 million pixel detectors, directly exposed to the
high intensity of particles: at 8 cm from the beam line the rate is ∼10 millionparticles per square centimeter per second. Therefore it holds a vital place
in the reconstruction of very short-lived particles (such as the beauty.
When a charged particle passes through a pixel, it supplies enough energy to
30
Figure 2.5: Schematic view of the CMS tracking detector [29].
the electron of the Silicon atom to be excited, creating an electron-hole pair.
Each pixel uses an electric current to collect these charges on the surface as a
small electrical signal, recorded by an electronic chip. Knowing which pixel
has been "touched" we can reconstruct the particle track. Since the detector
is made of 2D pixels and has a certain number of detection layers (3 layers in
all), it is possible to create a 3D image of the track. Silicon pixels provide a
measurement accuracy of 10 µm for the coordinates in the transverse plane
(x-y), and 20 µm for the coordinate z. The spatial resolution is around 10
µm for measures R-φ and 20 µm for the measures z [25].
Immediately after the 3 layers of pixels, the particles pass through 10 layers
of strip detectors, reaching a radius of 130 cm. The strips work in a similar
way to the pixels and the collected charge in the form of an electrical pulse
is ampli�ed and read. The microstrips provide a resolution that depends on
the thickness of the cell, but it is still better than 55 µm in the transverse
plane: in particular the resolution in a single point varies from 35-52 µm in
direction R-φ and 52 µm in z [25].
The inner tracker as well as its electronics are constantly exposed to high ra-
diation but are designed to withstand it for ten years. However, to minimize
the disturbances, this part of CMS is kept at -20 °C to "freeze" any damage
and prevent its propagation.
During Run 1 some dynamic ine�ciencies were observed in the pixel sub-
detector, of the order of about 5% in the �rst barrel layer, due to the limited
size of the readout bandwith [30]. A new pixel detector was installed in
31
March 2017, equipped with one additional barrel layer and one additional
forward disk, as shown in �gure 2.6: the innermost layer is closer to the
interaction point and the outermost one is further away from it.
Figure 2.6: Schematic drawing of the upgrade of the pixel detector. The
lower half shows the old pixel detector, the upper half represents the newly
installed detector [31].
An improvement of the e�ciency is observed with the new pixel detector
with respect to the 2016 data and the results are reported in �gure 2.7.
The electromagnetic and hadronic calorimeters
The calorimeter in general has the main task of measuring the energy of the
particles passing through it, by means a destructive process of the particle:
it interacts with the sensitive calorimetric material and produces a shower
that propagates and stops in the whole structure, allowing the collection of
a signal proportional to the energy of the starting particle.
Electrons and photons, from prompt interactions or embedded in an hadron
or τ -jet, stop within the electronmagnetic calorimeter (ECAL), precisely be-
cause their main interaction channel is electromagnetic.
In CMS, the ECAL is an homogeneous calorimeter composed of lead tungstate
crystals (PbWO4), very dense (8.28 g/cm3), fast scintillating (80% of the light
is emitted within 25 ns). The cell size is 23 cm long (25.8 radiation lenght1)
in the radial direction to ensure the complete development of the showers: so
very compact, granular and radiation-resistant detectors are produced [25].
Since these crystals are mainly made of metal, but with the presence of oxy-
gen, they result highly transparent and "sparkly" when crossed by photons
1 The depth at which the electron energy is reduced by a factor 1/e, due to the
bremsstrahlung alone.
32
Figure 2.7: Hit e�ciency of the pixel detector vs instantaneous luminosity
during during Run 2 measured with 2017 data [30].
or electrons: in fact when these particles hit the heavy nuclei of the ECAL
crystals, they excite their electrons. When the atoms "relax", they emit pho-
tons of blue-green wavelength with a maximum range of 420-430 nm, which
are collected and ampli�ed by the photo-detectors, glued to the back of each
crystal.
The electromagnetic calorimeter forms an intermediate layer between the
tracker and the hadronic calorimeter: in this last one the energy of the par-
ticles, which are instead subjected to strong interaction, is deposited.
In CMS, HCAL is a sampling calorimeter, with layers of sensitive material
(�uorescent plastic scintillator), alternated with layers of dense steel or brass
absorbers. When an hadron crosses the absorber, it strongly interacts with
the nuclei in the medium and the production of secondary particles which
in turn interact, forming the hadron shower. As the cascade develops, the
particles pass through the scintillator layers, causing the emission of a light
typically with a blue-violet wavelength. The used scintillator is able to shift
it into the spectral region of the green, to which the photo-cathode windows
which convert it into an electrical signal are most sensitive.
33
As the HCAL is constructively thicker than the ECAL, since the hadronic
shower are wider, the minimum quantity of material needed to contain the
cascade is about 1 m. In order for a similar structure to be hosted in a
"compact" detector like CMS, it is organized in 3 sections: barrel, endcap
and forward section. The barrel are in turn divided so that some layers con-
stitute the last detection layer inside the solenoid, while few other layers are
external to it, the so-called outer calorimeter HO, and improve the contain-
ment of hadronic showers. Finally, two forward sections (HF), are positioned
at both ends of CMS, at about ± 11 m away from the interaction point,
to collect the myriad of particles from the collision region at shallow angles
compared to the beam line. They are Cherenkov-Based iron/quartz �ber
calorimeters. The Cherenkov light emitted in the quartz is revealed by the
photomultipliers. HF ensures full geometric coverage for transverse energy
measurements in the event [32].
The muons system
The only particles that manage to go beyond HCAL are two: weakly in-
teracting particles, such as neutrinos, and muons. While neutrinos escape
detection and can be identi�ed by means of missing energy measurements,
muons are charged particles which are then tracked in the appropriate muon
chambers and whose momentum is extrapolated from the curvature of their
tracks in the solenoid.
The muon system covers a region of pseudorapidity of |η| < 2.4 and has
three main tasks: trigger the muons, identify them and improve charge and
momentum measurements for high pT .
As the muons are the protagonists of the thesis study presented here, a more
in-depth section is dedicated to their detection system in next Section.
2.2.2 The CMS muon system
As the same acronym suggests, the most important task of CMS is the de-
tection of muons, which give a very clear signature od several new physics
processes. Muon detection is then a powerful tool for recognizing signatures
of interesting processes over the very high background rate expected at LHC.
Since muons can penetrate several meters of iron before interacting, unlike
other particles they do not stop in calorimeters. This is why muon chambers
are placed in the outer layers of the cylinder.
34
The basic detection process used in the CMS muon system is gas ionization,
exploited using di�erent technologies: drift tubes (DTs), cathode strip cham-
bers (CSCs) and resistive plate chambers (RPCs), chosen on the basis of the
expected background2 and the magnetic �eld. In �gure 2.8 is clear that the
background is higher in the endcaps where the magnetic �eld is less intense.
Figure 2.8: Magnetic �eld mapping (left) and its �eld lines (right) provided
for a longitudinal section of CMS using a magnetic �eld model 3.8 T[33].
Each of these detectors has a basic physical module called "chamber".
The chambers are units that operate independently and are in total 1400:
250 DT and 540 CSC track the position of the particles and provide a trig-
ger, while 610 RPC form an additional trigger system that promptly decides
whether or not to keep the data of the acquired muon.
The use of these 3 di�erent technologies makes it possible to distinguish in
the detector 3 di�erent regions, naturally de�ned by the cylindrical geometry
of CMS: barrel in correspondence with |η| < 0.9, overlap with 0.9 < |η| <1.2and endcap at 1.2 < |η| <2.4.
The DTs are placed in the barrel, while the CSCs are the endcap disks that
cover the ends of the barrel. The RPCs are divided between the two regions
and are interspersed with the DTs and CSCs. All chambers are arranged to
maximize coverage.
The term "station" refers to a set of chambers around a �xed value of the
2The background is mainly induced by the interaction of neutron gas (created by
hadrons interaction with the material of the beam pipe) that produce mainly photons,
electrons and positrons.
35
Figure 2.9: Transversal section of a CMS quadrant, in the R-z plane. The
interaction point coincides with the zero of the axis system[34].
radial distance R (in the barrel) or of the distanze along the z direction (in
the endcap). So, as it can see in �gure 2.9, there are 4 stations in the barrel,
MB1-MB4 (orange areas), and 4 in the endcap, ME1-ME4 (green areas), in-
terspersed with steel discs (dark gray areas). The RPCs (blue zones) that are
placed in the barrel are indicated with RB1-RB4, while in the endcap with
RE1-RE4. Along z, DT and RPC in the barrel are divided into 5 "wheels":
Wheel 0 is centered at z = 0, while Wheel W + 1 and W + 2 in the positive
direction of the axis and W-1 and W-2 in the negative one. In the same
way, in the endcaps CSC and RPC are organized in "rings", indicated with
ME1/n-ME4/n, where n increases with the radial distance from the beam
line. In �gure 2.10, the amount of material thickness crossed by muons, as a
function of pseudorapidity, is shown.
Between Run 1 and Run 2 additional chambers were added in ME4/2 and
RE4 to increase redundancy (allowing the system to tolerate the loss or fail-
ure of some component), improve e�ciency and reduce the mis-identi�cation
36
Figure 2.10: Material thickness in interaction lengths at various depths, as
a function of pseudorapidity[29].
rate. Furthermore the trigger and the read-out electronics have been im-
proved as part of a larger updated trigger, includeing optical connections
between DT and CSC to increase bandwidth and facilitate maintenance; a
deep segmentation has been implemented in the scintillator layers of the
calorimeters using SiPM [32] [10].
DT - Drift Tube chambers
The DT chambers are located in the barrel, where the magnetic �eld is mostly
uniform, with a force less than 0.4 T between the segments of the return yoke
and in addition the rate of muons is low: they cover a region of pseudora-
pidity of |η| < 1.2 and are subdivided into 12 φ-segments which make up 4
stations. Each station, on average 2x2.5 m, consists of 12 aluminum layers,
organized in three groups of four consecutive layers, each with a maximum of
60 tubes, called SuperLayer (SL). Among this three groups, the intermediate
one measures the coordinate along the direction parallel to the beam line,
i.e. in the longitudinal plane (R-θ), while the two remaining SLs measure
the perpendicular coordinate in the curvature plane (R-φ). However, the
chambers in MB4 have only these last two groups. A honeycomb structure
separates an R-φ SL from the other two SLs (�gure 2.11).
These measurements are possible by exploiting the ionization process in the
gases. Each tube, 4 cm wide, contains a gold-plated anodic stainless steel
37
Figure 2.11: DT chamber layout in a barrel station[25].
wire, at a voltage of +3600 V, within a volume of gas (85%/15% Ar/CO2
[33]). When a muon passes through it, it ionizes the gas atoms thus pro-
ducing electron-ion pairs, which move following the electric �eld lines. The
electrons of the pairs end up on the positively charged wire and the induced
signal in the electronic is read. Recording the hit of the electrons along the
wire and calculating the distance of the original muon from the anode wire
(exploiting the knowledge of the electron drift time3, the DTs provide the
two coordinates for the position of the muons (muons hit).
The maximum drift length is 2.0 cm and the resolution in a single point is
about 200 µm[25].
CSC - Cathode Strip Chambers
The cathode strip chambers provide fast response times (due to the small
drift paths) and they can be �nely segmented, tolerating the non-uniformity
of the �eld with good tolerance, for these reasons they are located in the disks
at the ends of CMS, where the magnetic �eld is stronger and non-uniform
and the frequency of muons arrival is very high. They cover a region of pseu-
dorapidity 0.9 < |η| <2.4.Each endcap has 4 chamber stations mounted perpendicular to the beam.
Trapezoidal in shape (�gure 2.12), they consist of 6 layers of positively
charged anode wire arrays ( 2.9 kV for ME1/1 and 3.6 kV for the remaining
ones), crossed with cathodic charged copper strips negatively, within a vol-
ume of gas (50% CO2, 40% Ar e 10% CF4 [33]).
3Its value depends on the knowledge of the electron drift speed, in turn linked to the
characteristics of the gas; in this speci�c case it has speeds of 55 µm/ns and maximum
drift times of 400 ns [33]
38
(a) (b)
Figure 2.12: Schematic view of a CSC chamber[25] (a) and orthogonal
sections of one CSC layer (b).
A muon crossing the gas volume produces ionization charges which, moving
toward the electrodes, create an electron-ion avalanche. It produces a charge
on the anode wire and an image charge on a group of cathode strips. Since
the strips and wires are perpendicular, it is possible to obtain the position of
the particle in two coordinates: the cathodic strips provide a measurement
in the plane of curvature (R-φ), so they provide the azimuthal position at
which the muons intersect the gas volume, while anode wires provide a mea-
surement in the radial direction.
The spatial resolution provided by each chambers is typically around µm[25].
CSCs of di�erent sizes are used, ranging between 1.7-3.4 m in the radial di-
rection and, while the CSCs in the innermost rings of the stations 2.3 and 4
subtend an angle of φ = 20°, all the others subtend an angle of 10°.
DT and CSC together cover a pseudorapidity interval equal to |η| < 2.4,
guaranteeing a good identi�cation of muons over a range of 10° < θ < 170°.
Their crucial property is that they can identify the bunch crossing that gen-
erated the muon and trigger on the pT of the muons with good e�ciency, and
that they have the ability to reject the bottom by means of a temporal dis-
crimination. Since the minimum time interval between two bunch crossing is
25 ns, the ability of muon chambers lies in providing a rapid and well-de�ned
signal to trigger muons track: to ensure an unambiguous identi�cation of the
bunch crossing and a temporal coincidence between the segments of tracks
among the various muon stations, the signals must have locally a temporal
39
dispersion of a few ns, much smaller than the above 25 ns! This is what
happens in CMS.
RPC - Resistive Plate Chambers
RPCs, both in the barrel and in the endcaps, are fast gas detectors, with
excellent temporal resolution (1 ns), located in a pseudorapidity region |η|< 1.6. They are double gap chambers (see �gure 2.13), so consist of two
pairs of parallel electrodes (anode and cathode) of a very resistive plastic
material (graphite Bakelite), separated by a volume of gas (95.2% Freon,
4.5% isobutano e 0.3% SF6).
Figure 2.13: Schematic view of a double gap RPC structure. The read out
strip in the barrel region chambers are in the beam direction[25].
When a muon passes through the chamber, the electrons are ripped from
the atoms of the gas and in turn hit other atoms causing an "avalanche" of
electrons. It is said that they are in the avalanche mode, i.e they prevent the
formation of the streamer and are suitable to work even at higher rates (up
to 10 kHz/cm2[25]).
RPCs provide fast responses with good temporal resolution but with a coarser
spatial resolution than CSC and DT. They therefore allow unambiguously
to identify the correct bunch crossing; they also represent a trigger system
for muons complementary to that of DT and CSC, providing information
on the timing, since it is required that in multiple chambers the hits are
simultaneous. The time allocated to the muons hits, once the event has
been collected, is called "o�ine time": for muons traveling at the speed
of light, produced in pp collisions and with the correct assignment of the
BX, the "o�ine time" of the hit rate of each chamber should not be t = 0.
Each deviation from 0 could be caused by background such as cosmic rays,
40
background of the beam, noise of the chambers, pile-up or it could also be
an indication of new physics for example for a heavy charged particle moving
slowly. More details are provided in Ref. [34].
2.2.3 The trigger system
The proton-proton collisions at LHC are spaced by 25 ns corresponding to
approximately 20 simultaneous collisions pp at LHC design luminosity (1034
cm−2s−1). It is therefore impossible to store and process, in such a small
interval of time, a such large amount of data associated with the high num-
ber of events collected. The maximum data rate that can be managed by
the DAQ acquisition system is approximately 100 GB per second. However,
the rate of events is largely dominated by the pile-up events. Therefore it
becomes necessary to drastically reduce the rate. This task is guaranteed by
the trigger system, also known as the "online" selection system.
The CMS experiment uses 2 trigger levels, to decide whether an event can
be provisionally accepted or rejected, using information from sub-detectors.
The �rst trigger level or L1, composed of dedicated hardware systems (ASIC,
FPGA), selects events of interest and reduces the read-out rate from 40 MHz
to a minimum of 100 kHz, the upper limit imposed by the read-out electron-
ics of CMS. Given the very short time that L1 has to make the decision (3.2
µs), it cannot receive information from the whole detector, for this reason it
uses only the information of the calorimeters and the muon chambers, which
are the fastest.
It is therefore clear that the CMS constituents act themselves as triggers,
called primitive triggers. They are generated at the level of the front-end
electronics (FE) of the sub-detectors and processed in di�erent steps before
being combined into a single information, evaluated in a global trigger that
takes the �nal decision. In particular for the muon system, the DT and CSC
FE triggers identify the track segments from information on the hit (that is
the point of passage of the particle on one of the detector layers) recorded in
the di�erent detection layers. This operation is done by pattern recognition
algorithms which identify the candidate muons and measure their momen-
tum from their curvature in the magnetic �eld of the return yoke between
measurement positions. At the same time, the RPC triggers directly exploit
the information of the hits: they are directly sent by the FE to logic cards
that compare models and identify the candidate muon.
41
The second trigger level is called high level trigger (HLT) and is based on
software criteria: it further re�nes the purity of physical objects and reduces
the rate of events collected around 400 Hz [35], using information from the
whole event. The selection is made in a similar way to that used in o�ine
analyzes: for each event, objects such as leptons, photons and jets are recon-
structed and identi�ed using criteria that select only those events that are of
possible interest for data analysis, as it will be discussed in the next chapter.
2.3 The Run 2 data taking conditions
The LHC was designed to produce on average 25 of such pileup interactions,
however, during Run 2 the LHC was able to surpass this goal with on av-
erage 32 interactions in the years 2017 and 2018, and with more than 50
interactions in short periods. When the bunches of protons collide, multiple
protons interactions occurs. The particles from the interaction of interest
are thus recorded by CMS together with particles from multiple additional
interactions, so-called pileup interactions. Separating out particles from an
interaction of interest from several hundreds of particles produced in the
pileup interactions is one of the challenge for every physics analysis of CMS
data and indeed, is one fo the topic of the present work [36].
Interactions are distributed along the collision region on several cm in z; the
CMS tracker has a resolution in z better than 1 mm and usually associates
the correctly loaded tracks with the individual separated vertices[32]. Nev-
erthless some confusion and overlap between di�erent collision processes is
inevitable. However fortunately softer interactions deposit very little energy
in CMS calorimeters, while many of the events of real physical interest de-
posit larger quantities. Then discrimination by transverse energy thresholds
can be a valid strategy and requires a clear separation of high-energy deposits
in calorimeters from lower energies in the surrounding regions. By improving
the calorimeters the performance of the trigger can be improved and this was
one of the �rst objectives of the upgrade.
The pile-up that we have discussed derives from interactions in the same
crossing as the interesting triggered event and it is called "in-time-pile-up",
due to additional interactions in the current bunch crossing, which of course
depends on bunch spacing. These are low pT particles and are the great
source of hits in the tracker and also produce signi�cant energy deposits
in calorimeters. With "out-time-pile-up" instead it refers to two sources:
42
"early" pile-up in reference to the energy lost in the calorimeters for BX pre-
ceding the same crossing of interest; "late" pile-up that refers to the energy
from the successive BX [32][10]. The in-time pile-up can be observed in a
single bunch crossing by the many collision vertices that are reconstructed
by the tracking system. The updated tracking system can be designed with
additional segmentation to always associate more charged particles with the
correct interaction points; out-of-time pile-up can happens because the in-
trinsic response of the sensor or of the electronics is greater than the 25 ns
between a certain bunch crossing and the other. If the occupation of a given
channel is small, it is unlikely that there is another particle passing through
it in a time close enough to contaminate the triggered bunch. So increasing
the segmentation of the detector is a way of �ghting this type of pile-up. An
other sources of "out-time-pile-up" includes signals from very slow particles,
such as neutrons in particular, which have scattered many times in the de-
tector and could possibly deposit energy in an active element.
In �gure 2.14 is sohwn the mean number of interactions per bunch crossing
for the 2016/2017/2018 pp run at 13 TeV.
2.3.1 Future Persectives
With the perspective of Run 3 and the future HL-LHC, more attention must
be placed to the damage due to the high ionizing radiation, as well as to a
rise of the background rate. The main source of radiation is from the par-
ticles produced in the pp collisions: the charged particles, especially pions,
produce ionization in the detectors they pass through. They also undergo
nuclear interactions that produce cascades of particles that add up to the
radiation load. The photons, mainly deriving from the decay of π0, interact
in the tube material in which the beam is contained or in the tracking sys-
tem to produce pairs e+e− or reach the calorimeters in which they produce
electromagnetic cascades.
The particles can also be scattered backwards by calorimeters or escape the
calorimetric shower: they interact with other components of the detector.
When neutrons interact, they can also produce photons and electrons: a uni-
form background of very low energy neutrons, electrons and photons within
the detector volume is created, which have completely lost the correlation
with the original collision bunch structure. The update will therefore pro-
vide that in the ECAL the crystals will operate at lower temperatures to
reduce radiation damage and the reading electronics will be replaced.
43
(a) (b)
(c)
Figure 2.14: Pile-up distribution for the 2016(a) 2017(b) 2018(c) run, using
the "CMS recommended" value of minimum bias cross section, 69.2 mb [26].
The produced radiation damage varies from one sub-detector to another. For
Silicon detectors, the radiation produces defects in the silicon lattice result-
ing in a change in electrical properties. In the crystal calorimeters of PbWO4
the main problem is the transmission loss of the media through which the
scintillator light pass. So over time the signals may decrease at the expense of
an increase in noise levels, compromising the performance of CMS. There are
in particular two cases where radiation damage is important enough to re-
quire a true replacement of damaged detectors before Run 3: one involves the
inner radius of the calorimeter forward (HF) which receives enormous doses
that could reduce the transmission of photomultipliers windows; the other
regards the inner layer of the pixel detector in the barrel [32]. Moreover, to
face the problem, GEM (Gas Electron Multiplier) detectors technology was
44
used, in particular of triples GEM, which will be installed in muon stations,
completing the existing CSC ME1/1 and ME2/1, since these detectors have
thin pro�le and the ability of operating well at particle �uxes far above those
expected in the forward region under HL-LHC conditions. In the past, with
other experiments, it has been shown that GEMs are robust and reliable,
with high longevity and high rate capability[10].
In the forward region the background particle rates are higher and magnetic
bending power is much reduced (see �gure 2.8), so the additional forward
muon detectors will increase the average number of muon hits along a for-
ward track up to about the same level that is already present in the barrel
muon region of CMS.
45
Chapter 3
Muon Reconstruction and
Identi�cation at CMS
The analysis techinque at CMS is characterized by two principal steps: an
"online analysis" conducted by several trigger levels, mentioned in Chapter
2, and an "o�ine analysis", characterized by some algorithms which, using
speci�c criteria of selection, reconstructs and identi�es the realistic particles:
leptons, photons and jets. In particular, I focused the study on the muons,
since the crucial role they could play in the searches for new physics scenarios
with CMS experiment. Therefore the reconstruction and the identi�cation
of the muons become much considerable.
In this chapter in particular I will dwell on these algorithms in the medium-
high transverse momentum muons, while in the next chapter I will focuse on
the selection techniques in the low transverse momentum muons.
3.1 The o�ine muon reconstruction at CMS
The muon reconstruction is divided into 3 main steps:
o reconstruction of the hits and track segments in the muon system (local
reconstruction);
oo track reconstruction independently in the entire muon system and in
the inner silicon tracker;
o combination of these two informations and interpretation of the can-
didate muon in a description of the overall event, using calorimeter
46
information too.
The local reconstruction starts with the de�nition of hit positions in the DT,
CSC and RPC sub-systems, due to the passage of a muon (or other charged
particles). Hits within each DT and CSC chamber are then matched to
form straight-line track segments (track stubs). They are then collected and
matched to generate seeds that are used as a starting point for the track �t
of DT, CSC and RPC hits. The result is a reconstructed track in the muon
spectrometer, called Stand-alone muon, then matched with tracker tracks in
the inner tracker to generate a total muon tracks, featuring the full CMS
resolution. So the high-level muon physics objects are reconstructed in a
multi-faceted way, with the �nal collection being comprised of three di�er-
ent muon types: Stand-alone, Global and Tracker muons (see better Section
3.1.2). Finally information of the energy deposits in the calorimeters and
tracker tracks are combined together in the "isolation" observable (see sec-
tion 3.3) for the three di�erent muon collections.
In the �gure 3.1 it is reported an event example in which 4 muons are recon-
structed, involving all the main sub-detectors of CMS[28].
3.1.1 Reconstruction of the hits and the segments
Local reconstruction, as already mentioned, uses information from individual
chambers (DT, CSC or RPC) to specify the passage of a muon through the
chamber. The precise location of each hit is reconstructed, starting from the
electrical signal read by the electronics, using di�erent algorithms depending
on the detector technology used.
In a DT cell, the hit reconstruction speci�es the transverse distance between
the wire and the intersection, in the cell volume, of the muon trajectory with
the plane containing the wire in that speci�c detection layer. This algorithm
exploits the knowledge of a constant drift rate; the electrons produced by the
gas ionization when a muon crosses the cell are collected at the anode wire.
A time-to-digital converter (TDC) records its arrival time, indicated with
TTDC . This time is then corrected by the pedestal time, Tped, and multiplied
by the electron drift speed v, to reconstruct the position of the hit in DT,
according to the relation:
position = (TTDC − Tped)× v
47
Figure 3.1: Longitudinal (a) - R-z plane - and transversal view (b) - R-φ
plane - of a collision event in which 4 muons are reconstructed. The thin
green lines in the inner cylinder are the reconstructed tracks in the inner
tracker, with pT > 1 GeV/c; those lines that extend to the muon system are
the reconstructed tracks, using both hit in the inner tracker and in the
muon system. Three muons are identi�ed by DT and RPC chambers, while
the fourth by CSC. The small black stubs in the muon system show �tted
muon-track segments, while the horizontal ones in red (a) indicate positions
of RPC hits. The energy deposits in the calorimeters are also show: red
bars for ECAL, blue bars for HCAL [28].
where Tped is responsible for the time starting from the bunch crossing until
the arrival of the trigger decision to the readout electronics. It includes the
time of �ight (at the speed of light) along a straight line from the region
of interaction at the center of the wire, the mean propagation time of the
signal along the wire, the generation of primitive triggers and the processing
by L1. It also includes a component, wire by wire, which takes into account
the di�erent signal paths within the chamber.
Figure 3.2 shows a TTDC distribution measured for one super-layer.
The segment reconstruction works independently on the R-φ and R-z projec-
tions and only at the end of the procedure they are combined so as to have
a 3D segment information. Combining the hits of R-φ SL in a 2D segment,
the resolution reached in the position is about 70µm. Further details can be
found in Ref. [25].
In a CSC layer the hit reconstruction measures the position of the pass-
48
Figure 3.2: Distribution of the signal arrival times, recorded by the TDC,
for all the cells of a single super-layer in a chamber. The continuous line
indicates the �t of the TTDC rising edge to the integral of a Gaussian
function [37].
ing muon, from the combination of the informations from the cathodic strips
and from the anodic wires. As already mentioned, the wires in the CSCs
are orthogonal to the strips and are grouped in bunches of wires. A hit is
therefore reconstructed at the intersection point between the hit strips and
the wire groups.
The charge distribution due to the passage of a single charged particle is
typically distributed over 3 or 5 strips. The simple approach is to obtain the
pulse height in each strip and then cluster the neighboring strips to deter-
mine the probable incidence position of the muon. Each of the 6 layers of
the chamber is considered independently. A 2D hit is constructed at every
intersection of a 3 strip cluster and a group of wires, from the local values
of x and y, with uncertainties calculated from the wire resolution (w/√
12,
where w is the amplitude of the group of wires). On a subsequent stage, the
hits in the 6 layers of the chamber are �tted to form a track segment.
Typical resolutions per layer are about 50 µm for ME1/1 and between 100
and 150 µm for the rest, as pictured in �gure 3.3.
While the CSC and DT chambers are multi-layer systems that allow to build
track segments, the RPC are single-layer. Since the ionization charge from a
muon can be shared by more than one strip, the adjacent strips are grouped
49
Figure 3.3: CSC spatial resolution per layer as function of chamber type
[38].
together to form a cluster, where a hit is reconstructed as the center of grav-
ity of the strip cluster. In the barrel, where the strips are rectangular, this
point is simply the center of the rectangle. In the endcap, the calculation is
more complicated and the assumption is that every group of strips is "on"
as the result of a single particle that crosses the plane of the chamber and
that this crossing can take place anywhere with �at probability on the area
covered by the strip cluster.
The hit reconstruction e�ciency is calculated as the ratio of the number of
reconstructed hits divided by the number of expected hits. With Run 2, a
hit reconstruction e�ciency in the range 94-98% (see �gure 3.4) and a muon
segment reconstruction of 97% was achieved (see �gure 3.5).
3.1.2 Track reconstruction
In �gure 3.6, a CMS slice is shown to sketch the underlying logic of the 3
algorithms of muon reconstruction in CMS.
The tracker tracks are reconstructed by a part of CMS general tracking
with Kalman-�lter tecnique[39]; it uses an iterative approach, executing a
sequence of tracking algorithms, each with a slightly di�erent logic. After
each iteration step, when the �lter incorporates a new measurement, the
track parameters are recalculated, i.e the hits that have been associated with
50
(a) (b)
(c)
Figure 3.4: Hit reconstruction e�ciency measured with the 2018 data in
DT (a), RPC barrel (b) and RPC endcap (c) chambers [38].
a reconstructed track are removed from the input set to be used in the next
step. It is assumed that the system is linear, i.e the track model between
two measurement plans is linear in the parameters. This approach maintains
high performance and reduces processing time.
The stand-alone tracks are built starting from groups of track segments
locally reconstructed in all the sub-detectors of the muon system, along a
muon track, without any constraints to the interaction point, by means of
the Kalman �lter technique, using all DT, CSC and RPC information.
Tracks are build by combining the information from the two track types
in the sub-detectors. Two di�erent approaches are used: Global muon and
Tracker muon.
• Global Muon reconstruction: the tracks are built "outside-in", as
51
Figure 3.5: The e�ciency (in percent) of each CSC in the CMS endcap
muon detector to provide a locally reconstructed track segment as
measured from 2017 data.
Figure 3.6: Slice of CMS and main reconstuction algorithms: the green
circle indicates the Tracker track from inner tracker; red one the
Standalone-muon track from muon system. A di�erent combination of these
two methods is performed by Global muon and Tracker muon algorithms.
the algorithm starts from a reconstructed muon as standalone and ex-
trapolates its trajectory from the innermost muon station to the ex-
52
ternal surface of the tracker, taking into account loss of energy in the
material and multiple scattering[25]. Each standalone track can be
compared with the tracker track and eventually matched with the best
one. This is possible by comparing the parameters of two tracks that
have propagated on a common surface, in a certain region of interest.
The determination of this region is based on the track parameters and
on the uncertainties of the corresponding extrapolated track, obtained
with the assumption that the muon comes from the interaction point.
The choice of the region of interest has a great impact on e�ciency:
well-measured muons are reconstructed faster and more e�ciently than
those poorly measured. Within the region of interest, the initial can-
didates for the muon trajectory are built from pairs of reconstructed
hits, each hit coming from two di�erent tracker layers: all possible
combinations of pixels and silicon strips are exploited to achieve higher
e�ciencies. Starting from the hits, the tracks are reconstructed using
the Kalman-�lter technique, based on standalone and tracker tracks
informations.
• Tracker Muon reconstruction: the tracks are here built "inside-
out", i.e starting from the tracker tracks and assigning to each of them
a compatibility value with a muon in the muon system (including those
tracks not associated with a standalone track in the muon detector).
Each tracker track with pT > 0.5 GeV/c and total momentum p > 2.5
GeV/c is considered a possible muon candidate and is extrapolated to
the muon system, taking into account the magnetic �eld, the expected
average energy losses and the Coulomb multiple scattering in the de-
tector material. If at least one muon segment (from DT and CSC hits)
is matched to the extrapolated track, the corresponding tracker track
quali�es as a Tracker muon.
The track-segment matching is represented in a coordinate system (the
chamber), in which the local x is the coordinate measured in the R-z
plane and the local y is the one orthogonal to it. The segment and the
extrapolated track are considered coupled if the distance between them
in the local x coordinate is less than 3 cm or if the value of the pull1
for the local x is less than 4 [28].
1The pull is de�ned as the di�erence between the position of the matched segment and
that of the extrapolated track divided by their combined uncertainties
53
The performance of the reconstruction algorithms are reported in Appendix
B.
3.2 Muon identi�cation at CMS
Particles detected as muons are produced in pp collision from di�erent sources
which lead to di�erent experimental signatures.
o Prompt muons: muons arising either from decays of W, Z and promptly
produced quarkonia states, or other sources such as Drell-Yan processes
or top quark production.
oo Muons from heavy �avour: muons produced from the decay of a
beauty or charmed hadron or a τ lepton.
o Muons from light �avour: muons from a decay in �ight of light
hadrons (π and K) or, less frequently, from the decay of particles pro-
duced in nuclear interactions in the detector material.
o Hadron punch-through: in this class the most of muon chamber
hits were produced by a particle that was not a muon. The so called
�punch-through� (i.e. hadron shower remnants penetrating through the
calorimeters and reaching the muon system) is the most common source
of these candidates although �sail-through� (i.e. particles not undergo-
ing nuclear interactions upstream of the muon system) is present as
well.
Physics analyses can set the desired balance between identi�cation e�ciency
and purity by applying a selection based on various muon identi�cation vari-
ables, muon ID. Such muon IDs are based on reconstruction-related vari-
ables, such as the track �t χ2, the number of hits per track (either in the
inner tracker or in the muon system or in both), the degree of matching
(value between 0 and 1) between the tracker track and standalone track (for
"global" muons). An algorithm for searching for the "kink" (a sudden change
in the direction of the track, interpreted as a charged particle that decays
into a neutral and a charge particle) breaks the tracker track into two sep-
arate tracks at di�erent points along the trajectory. For each division, the
algorithm performs a comparison between the two separate tracks and a high
value of χ2 indicates that the two tracks are incompatible with being a single
54
track. Moreover, some muon IDs exploit external inputs to the track of the
reconstructed muon, such as compatibility with the primary vertex.
The main muon identi�cation algorithms used in CMS physics analysis in-
clude:
• Loose muon ID : a "loose" muon is a muon selected by the Particle
Flow algorithm2 (PF). Moreover it is required that this muon is either
a tracker muon or a global muon. The Loose ID aims to identify the
muons from the decays of heavy and light hadrons, as well as maintain
a low rate of the misidenti�cation of charged hadrons as muons. The
Loose ID allows also to identify a prompt muon, by complementing
previous criteria with requirements on the impact parameter.
• Tight muon ID : a "tight" muon is a "loose" track with a tracker track
that uses hits coming from at least 6 layers of the inner tracker, includ-
ing at least one hit in the pixel detector. The muon candidate must be
reconstructed as both a tracker muon and a global muon. The tracker
muon must have segments matching in at least 2 muon stations, while
the global muon �t must have χ2/dof < 10 and at least one hit in a
muon chamber. A tight muon must be compatible with the primary
vertex, having transverse impact parameter < 0.2 cm and a longitudi-
nal impact parameter < 0.5 cm.
With this selection, the rate of muons from �ight decays is punch-
through are signi�cantly reduced, at the price of a loss in e�ciency of
a few percentages per prompt muon [28].
• Soft muon ID : it is optimized for low pT (< 100 GeV/c) muons for B
and quarkonia physics analyses. A "soft" muon is a tracker muon with
a tracker track that uses hits from at least 6 layers of inner tracker,
including at least one hit in the pixel detector. This selection requires
a muon segment to be matched in the x and y coordinates to the
extrapolated tracker track, so that the pull is actually less than 3.
A soft muon is loosely compatible with the primary vertex, having a
distance in trasverse plane |dxy| < 0.3 cm and |dz| < 20 cm[34].
2It uses the best combination of all CMS sub-detectors to measure the properties of
individual particles and identify them as electrons, hadrons and muons. In particular,
the muons are reconstructed using the information from the tracker and the muon system
to create a tracker-quali�ed muon from the inner tracker and requesting matching to the
muon segments to the global muons, requiring a global track in both sub-detectors. The
tracks identi�ed as muons are indicated in the next step of the reconstruction.
55
• High-pT muon ID : it is optimized for muons with pT > 200 GeV/c. This
object is reconstructed both as a tracker and a global. The requests
on the tracker track, on the tracker muon and on the transverse and
longitudinal impact parameters are the same as for the "tight" muon, as
well as the demand that there be at least one hit from the muon system
for athe global muon. However, unlike the "tight" ID, the request
on the χ2/dof of the global �t is removed. This remodeling avoids
ine�ciencies at high pT , when the muons radiate wide electromagnetic
shower as they pass through the return yoke of the magnetic �eld,
giving rise to additional hits in muon chambers.
3.3 Muon isolation
Since muons produced by Z, W and τ decays are expected to be isolated
in the detector, while leptons from heavy �avor decays and π - K decays in
�ight are expected to be inside jets, a powerful tool to distinguish them is
the Muon Isolation. Figure 3.7 shows for example the performance of the
Tight isolation algorithm as function of the transverse momentum.
Figure 3.7: Tight isolation e�ciency vs pT [40].
The Muon Isolation is usually calculated using detector information from
56
the tracker and the calorimeters, by means two di�erent algorithms: Tracker
relative isolation, a track-based isolation, and Particle-�ow relative isolation,
a particle-based isolation.
Tracker relative isolation: it calculates the scalar sum of all pT tracker tracks
reconstructed in a cone of radius ∆R ≡√
(∆ϕ)2 + (∆η)2, centred on the
muon track direction, ∆η and ∆φ being the distances in pseudorapidity and
azimutal angle between the deposit (sum of transverse momenta of recon-
tructed track) and the cone axis, respectively. (see �gure 3.8).
Figure 3.8: Isolation cone with axis which is direction of the muon at the
vertex[25].
The HCAL and ECAL energy deposit in the cone is computed and the muon
contribution to the energy measurement inside the cone is removed, by ex-
cluding the small area around the muon (the �veto value�) from the cone.
Comparison of the deposit in the cone with a prede�ned threshold deter-
mines the muon isolation. This alghorithm has high e�ciency and small
dependence on pile-up [41].
Particle-�ow relative isolation uses charged and neutral particles from particle-
�ow (PF) algorithm: the pT of charged particles originating from the primary
vertex are summed together with the total energy of neutral particles in the
same cone. The contribution from pile-up to the neutral particles is corrected
by computing the sum of charged hadron deposits originating from pile-up
vertices, scaling it by a factor of 0.5, and subtracting this from the neutral
57
hadron and photon sums to give the corrected energy sum from neutral par-
ticles. The factor of 0.5 is estimated from simulations to be approximately
the ratio of neutral particle to charged hadron production in inelastic proton-
proton collisions[34].
The values for the tight and loose working points for PF isolation within
∆R < 0.4 are 0.15 and 0.25, respectively, while the values for track based
isolation within ∆R < 0.3 are 0.05 and 0.10 [34].
In conclusion, the muon reconstruction creates muon objects, the identi�ca-
tion helps solving potential ambiguities at the analysis level, and "isolation"
distinguishes muons from jets.
3.4 Performance of Muon Identi�cation and Iso-
lation Algorithms
The performance for muons ID/isolation algorithm is studied with the "tag-
and-probe" method, starting from a tracker track as "probe". This technique
consists in selecting particles of the desired type (muons or electrons) coming
from the decay of known mass di-object resonances, like Z, Y or J/Ψ. The
�tag� is an object that passes a set of very "tight" selection criteria designed
to isolate the required particle type. The "probes" are selected by pairing
the tags such that the invariant mass of the combination is consistent with
the mass of the resonance. The probes are used to measure the e�ciency of
a particular selection criterion.
In �gure 3.9 is reported the e�ciency of Loose and Tight identi�cation algo-
rithms, for muons with pT > 20 GeV, measured in Z → µµ events.
The slight improvement in the 2018 performance with respect to 2017 is due
to the recovery of some CSC chambers.
The performances of the Loose and Tight working points are consistent
within the di�erent years.
The drops visible in Tight ID at around |η| = 0.2 are due to the gap between
two wheels in the muon system.
In �gure 3.10 is shown the Loose tracker isolation and tight Particle-�ow
isolation, as function of η, for muons with pT > 20 GeV. In both cases there
is a rather agreement between data and simulation.
58
(a) (b)
Figure 3.9: Tag-and-probe e�ciency for identi�cation in 2016-2017-2018
data for loose (a) and tight (b) muons, with pT > 20 GeV[41].
Figure 3.10: Loose tracker (left) and Tight PF (right) iso e�ciency in 2017
data, for muons with pT > 20 GeV [41].
The performances shown so far are obtained for pT > 20 GeV, in Z→ µµ
events. In �gure 3.11 are reported instead the performance for the Soft
muon ID, measured in the J/Ψ→ µµ events, for muon pT > 3 GeV.
59
Figure 3.11: Low-ppT Soft ID e�ciency vs η, using Tag-and-Probe method.
The Soft muon ID is commonly used in analysis involving low pT muon in
the �nal state and exhibit high e�ciency over the whole CMS acceptance.
However a complementary information in this context concerns the e�ciency
on background, that will be investigated in next chapter.
60
Chapter 4
Study of a low-pT muons
identi�cation algorithm with
MVA techniques
4.1 Introduction
Muon reconstruction and identi�cation algorithms in the CMS experiment
shows excellent performance for medium pT (5-10 GeV), intermedium pT (10-
200 GeV) and high pT (> 200 GeV). The purpose of my thesis was to extend
these muon identi�cation algorithms to the low-pT region (< 3-4 GeV), for
possible future searches of new physics at CMS. As discussed in Section 1.6,
many interesting new physics scenarios involve low-pT muons in the �nal
states, e.g. muons coming from the decay of heavier leptons with lepton �a-
vor violation or from the decay of light bosons foreseen by SUSY theories.
A dedicated muon identi�cation algorithm can be particularly useful in cases
where muons are characterized by soft pT spectrum, since these muons are
more likely to not be fully reconstructed, because the entire muon identi�ca-
tion in CMS, from detection up to reconstruction, has been optimized mainly
for muons coming from heavy bosons decay (W, Z, Higgs). The loosening of
the reconstruction requirements, needed for the low-pT muon tagging, leads
to increased probability of having a low-pT hadron incorrectly reconstructed
as a muon (fake rate). In this context, as it will be shown in section 4.4, two
main sources of background can be identi�ed:
� muons from K and π decays in �ight;
61
�� K, π and p produced with non-negligible relativistic boost reaching the
muon system and thus misidenti�ed as muons.
The former are real muons that can be distinguished from the prompt muons
just by the study of the secondary vertex and its related variables measured
in the tracker. The latter instead must be distinguished using calorimetric
and muon-system related observables and are the main focus of the work
developed in this thesis.
A feasibility study to discriminate signal muons from the backgound ones has
been carried out using the DS → νττ , τ → 3µ as a physics case that exploits
the Multi-Variate Analysis (MVA) and Machine Learning (ML) techniques
(described in the appendix A). The study has been performed in four phases:
1. �rstly I studied the performance of the muon identi�cation algorithms
commonly used in CMS, using dedicated muon-enriched datasets;
2. I investigated the background composition in searches involving low-pTmuons;
3. then I developed a new algorithm based on Machine Learning tech-
niques, combining several variables related to the muon reconstruction;
4. �nally, I studied the performance of the new algorithm in dedicated
control samples, independent from the ones used for the training.
In this chapter the full study is presented, exploiting the Toolkit for Mul-
tivariate Data Analysis (TMVA, see appendix A) with ROOT [42], a data
analysis framework largely used in high energy physics.
4.2 Monte Carlo simulation in CMS
To build a model that could well discriminate the signal muons from the
background ones, I used samples generated with Monte Carlo (MC) tech-
nique.
The MC events are generated with a computer simulation program for high-
energy collisions, called PYTHIA [43], that includes an implementation of the
theoretical models underlying the proton-proton collisions and the emerging
62
particles. The particles emerging from the primary interactions and their de-
cays are stored as GenParticles, a data-format that contains the type of the
particle: this information is saved in accordance to the PDG Monte-Carlo
numbering scheme [4] and is indicated as PdgId variable, for the particle in
exam, or with the pre�x "Mother", referring to the parent of the particle in
exam. A list of the PdgId numbers most used in this study are reported in
table 4.1.
Table 4.1: Main PdgId numbers used in this thesis.
PdgID(µ ) 13
PdgID(τ) 15
PdgID(K) 321
PdgID(π) 211
PdgID(p) 2212
The GenParticles are then processed through a simulated detector based
on GEANT4 [44], a toolkit that simulates how particles propagate through
space, interact with the detector material and loose energy. The resulting
data is stored in the form of "simulated hits" (simHit), which contain the
information about the part of detector in which the hit has been generated,
the particle type, the process in which the particle that caused the hit was
produced, the energy deposited in the detector unit, the entry point in the
local coordinate system, the momentum at entry and the time of �ight from
the primary vertex. The simHits are then used to build the related tracks
(simTracks).
Starting from the positions of the hits and from the simulated energy losses
in the simHit of a passing particle, a digitization phase follows, which sim-
ulates the response of the readout electronics of the detectors, which must
be as close as possible to the resulting one from real data: the simHit(s) are
converted into digi(s). The so called pile-up interactions (see section 2.3),
created by multiple proton-proton collisions happening in the same bunch
crossing, are then superimposed to the hard-scattering interaction, as they
are faster than the detector simulation and therefore require to be treated
individually.
From the digi, the reconstructed hits in each sub-detector are derived (re-
cHits), that contain the information about the energy and the position for
a single detector elements. Sophisticated algorithms are then run on top of
63
the recHits to build the higher lever objects, such tracks, muons, electrons,
photons and jets.
The reconstructed muons, used in this study, are built with the Tracker or
Global algorithms, decribed in Section 3.1.2.
4.2.1 Matching generated to reconstructed particles
To facilitate interfacing between event generators, detector simulators and
analysis packages used in particle physics, a set of Monte Carlo truth in-
formation is stored in the reconstructed object, that keeps track of the cor-
rensponding generated and simulated object. In particular an association
between reconstructed muon and the simulated muon track is performed hit
by hit both in tracker and muon detectors. The quality of the association is
then given by the degree of hit sharing between the reconstructed and the
simulated track. The reconstructed muon with the best matching quality is
then chosen to be associated to a given simulated track. Since at the sim-
track level the particle identity and generator-level kinematics are known,
those information are also kept in the reconstructed object.
4.3 Signal description: Ds → τ+ντ → 3µ+ντ
The MC simulated sample used to study the signal muons is centrally pro-
duced by CMS, with the conditions of the 2017 pp collisions and data-taking.
The details of this dataset are reported in Table 4.2 and in the following I
will refer to this sample as DsTau3mu sample.
Table 4.2: Signal MC dataset.
Dataset name Number of events
DsToTau_To3Mu_MuFilter_TuneCUEP8M1_13TeV − pythia8/RunII−Fall17DRPremix− PU2017_94X_mc2017_realistic_v11− v1/AODSIM
3.6 x 106
The LFV τ → 3µ decay has been chosen as a physics case here, though the
results achieved in this context could be easily extended to other �nal state
topologies with the same kinematics.
The dataset is generated using PYTHIA 8, with a Minimum Bias con�gu-
ration (i.e. generic inelastic pp collisions with all the associated emerging
64
tracks), and the decay chain is programmed using the EVTGEN [45] as fol-
lowing: if a DS meson is found after hadronization, it is forced to decay to a
τ , which is subsequently set to decay to 3µ. The τ → 3µ process is assummed
to be 3-body phase-space decay.
Only generated events containing at least two generated muons within the
CMS acceptance, are then processed with time consuming full simulation.
In �gure 4.1, the generator-level pT and η distribution of the muons coming
from the τ decay are reported, where the leading muon is one with the high-
est pT (green distribution) and trailing muon is the one with lowest pT (blue
distribution) among the three muons.
(a) (b)
Figure 4.1: η (a) and pT (b) distribution of the 3 muons from τ decay, at
generator level, at CMS acceptance, where the leading muon is one with the
highest pT (green distribution) and trailing muon is the one with lowest pT(blue distribution) among the three muons.
The bulk of the pT spectrum of the signal muons lies below 10 GeV. In par-
ticular the lowest pT muon, the trailing one, peaks around 2 GeV. In the
context of this search, it is crucial to preserve the highest detection e�ciency
on all the three muons. Loosing one of them due to an incorrect identi�ca-
tion would translate in a reduction of the signal acceptance and thus in a
reduction of the sensitivity of the search of such a rare process (see section
1.6.1).
65
4.4 Study of the background composition
The DS → τντ → 3µντ sample has been created using the realistic conditions
of the pp collision of the 2017 data-taking. This means that each event of
interest from the primary interactions has been superimposed to the pile-up
interactions, that are a rich source of background muons.
In the following, muons from the τ decay will be referred as "signal" muons.
Reconstructed muons not matching at gen-level with the muons from τ decay
are referred to as "background". Background muons have been divided into
four categories, depending on the MC truth information:
1. muons (PdgId=13) from K (Mother_PdgId=321) decay: muons pro-
duced by a true muon from a decay in �ight of kaons;
2. muons (PdgId=13) from π (Mother_PdgId=211) decay: muons pro-
duced by a true muon from a decay in �ight of pions;
3. muons (PdgId=13) from others decays: true muon produced from par-
ticles other than kaons and pions decays. Most of such muons are
produced in D and B mesons decay. A full list of particles decaying
into muons together with the occurrence in the MC sample is reported
in table 4.3.
4. non-muons (PdgId 6=13): the genParticle matching with these wrongly
reconstructed muons are mainly pions, protons and "unknown" (PdgId
= 0), when the reconstructed muon cannot be associated to any generator-
level particle. As discussed in section 4.1, this happen when either the
muon track is too soft and the generator level information is not saved
or when the number of hits associated with any simulated track is not
su�cient. A full list together with the occurrence in the MC sample is
reported in table 4.4.
The most relevant contribution to background in the MC signal sample comes
from the fourth category, containing hadrons that reach the muon system.
The probability for an hadron to reach the muon system is very small, be-
cause of the higher energy loss due to the nuclear interactions with the CMS
material budget. However it comes out to be not negligible with respect to
the signal as it will be discussed in the next section. Such probability has
been computed as the ratio between the number of kaons, pions and protons
66
Table 4.3: List of percentage fractions of the gen-level particles decaying
into muons found in the DS → τντ → 3µντ MC sample.
Table 4.4: List of percentage fractions of the gen-level particles with
PdgId 6=13 associated to the reconstructed muon as explained in section 4.1,
with respective errors, in the DS → τντ → 3µντ MC sample.
that are tagged as reconstructed muons and the total number of simulated
tracks associated to kaons, pions and protons that do not decay before reach-
ing the muon system. The results are reported in table 4.5.
In order to compare the di�erent categories of background, I studied the
67
Table 4.5: Probability for an hadron to be tagged as a muon.
Particle Probability
Pions 4 x 10−3
Kaons 2 x 10−3
Protons 4 x 10−4
distributions of the kinematic variables, shown in �gure 4.2.
(a) (b)
Figure 4.2: η distribution (a) and pT distribution (b) of the reconstructed
muons, distinguished in the 4 categories of the background.
The distributions show a signi�cant background contamination in the low pTrange, where a large part of the signal muons lies. As already mentioned, the
most aboundant contribution comes from other particles erroneously identi-
�ed as muons (non-muon). In �gure 4.3 the pT distribution of each category
of non-muons (as shown in table 4.4).
In the search of a very rare process, the important point is to maximize
the signal acceptance, while keeping the background level at the order of few
percent. In the speci�c contest of the τ −→ 3µ search, this means that muon
identi�cation algorithms should garantee this performance even on the trail-
ing muon to preserve the signal acceptance. In the present work I developed
a dedicated muon ID using Machine Learning techniques, targeting the low
pT muons, focusing on the non-muon category, being more abundant.
68
Figure 4.3: pT distribution of the 4 categories of the non-muons. The
histograms are normalized to their area.
4.5 Performance of the standard muon ID al-
gorithms
The �gures of merit used to quantify the performance of the muon ID algo-
rithms are e�ciency and fake rate.
The e�ciency is de�ned as the ratio among the number of the reconstructed
muons associated to a true muon, at generator level, that ful�ll a given ID
algorithm and the total number of associated muons:
εsig =#recoMuon associated to a true muon that pass ID
#recoMuon associated to a true muon
The computation of this quantity was performed using the muons coming
from τ → 3µ decay for three standard ID algorithms: Soft, Loose and Tight
ID algorithms (see �gure 4.4).
The Soft and Loose muon ID algorithms show a close to 100% e�ciency,
while Tight ID is ine�cient, due to the strict requirements on the vertex.
The fake rate is de�ned as the ratio between the number of reconstructed
muons candidates not associated to a true muon, at generator level, that
ful�ll the criteria imposed by a given identi�cation algorithm and the total
number of the not associated muons:
fake rate =#recoMuon not associated to a true muon that pass ID
#recoMuon not associated to a true muon
69
(a) (b)
Figure 4.4: Soft, Loose and Tight e�ciency vs η (a) and pT (b), for muons
from τ → 3µ decay.
It quanti�es the e�ciency of a given muon identi�cation algorithm on non-
muon background, or, in other words, the false positive rate. The fake rate
for non-muon has been evaluated for the Soft, Loose and Tight ID and it is
reported in �gure 4.5.
(a) (b)
Figure 4.5: Soft, Loose and Tight fake rates vs η (a) and pT (b).
It can be noticed that the fake rate is on avarage of 50% for Soft ID and
on average of 40% for Loose ID. The Tight ID shows a very small fake rate
70
(about 5%), but on the other hand it is characterized by a small e�ciency
on the signal.
In addition I de�ned as εK,π→µµ the probability of muons coming from K
and π decay to pass the Soft, Loose and Tight ID criteria and the computed
values are reported in �gure 4.6.
(a) (b)
Figure 4.6: Soft, Loose and Tight fake rates vs η (a) and pT (b).
The probability for an hadron to be full identi�ed as a muon is given by
the convolution of the probability for hadrons to reach the muon system (ta-
ble 4.5) and the muon ID fake rates showed in �gure 4.5.
It is important to underline that in the search for very rare processes, such
as τ → 3µ, the background coming from the standard model processes is
O(107) (see section 1.6.1) with respect to the expected signal. The aim of
this study is to signi�cantly reduce the overall fake rate by implementing a
new muon ID algorithm.
4.6 Implementation of the MVA discriminator
As discussed in the section 4.1, the searches of new particles or LFV decays
involving low momentum muons in the �nal state, have to deal with back-
ground that is tipically orders of magnitude higher than the signal. There-
fore, it is crucial to identify the �nal state muons in the best possible way.
71
This means that the algorithm dedicated to the muon identi�cation should
be characterized by:
• high e�ciency, closer to 100%;
•• negligible fake rate.
As shown in the previous paragraph, an irreducible source of background for
this searches is given by the semi-leptonic decay of heavy and light �avor
mesons (K, π, but also D and B mesons): this background can be removed
by topological requirements, applied at the analysis level, and is not studied
in the present work. On the other end, a relevant contribution to the back-
ground comes from the light �avor mesons erroneously identi�ed as muons.
The standard muon identi�cation algorithms are not optimized for the ultra-
soft pT range (< 3-4 GeV), the fake rate being ∼50% for the Soft ID and
40% for the Loose ID (see �gure 4.5.
The idea is to develop a multivariate discriminator, combining several input
variables describing the quality of the muon reconstruction, the energetic
deposits in the calorimeters and the timing information, to distinguish low
momentum muons from pions and kaons, with very high e�ciency and a few
percent fake rate.
This is a binary classi�cation problem, that will be treated with the Ma-
chine Learning (ML) techniques, by training proper discriminators. They
are more sophisticated techniques, which use input information from multi-
ple variables from various sources (called features), automatizing Multivari-
ate Analysis (MVA), train a data sample, in order to make predictions on
unknown datasets, and learn how to classify new datasets: they provide a
"response" that is the result of the discrimination between the event of in-
terest (de�ned as signal) and the background.
There are many possible classi�cation techniques or classi�ers : the two used
in this analysis are the Boosted Decision Tree (BDT) and the Multilayer
Perceptron (MLP). See appendix A).
Boosted Decision Tree
The BDT is a binary tree classi�er with a central node, from which a sequence
of splits unfold. Each split uses the variable that at this node gives the best
separation between signal and background when being cut on. Since a cut
that selects predominantly background is as important as one that selects
72
signal, the criterion is symmetric with respect to the event classes. The
node splitting stops once it has reached the minimum number of events, the
maximum number of nodes and the maximum depth. The leaf nodes at the
bottom end of the tree are labeled signal and background, depending on the
majority of events that end up in the respective nodes [46].
The boosting of a decision tree (a recursive algorithm applied on reweighted
(boosted) versions of the training data) extends this concept to several trees,
forming a forest: the trees are derived from the same training ensemble by
reweighting events, and are �nally combined into a single classi�er which is
given by a (weighted) average of the individual decision trees. The boosting
used in this study is the AdaBoost (see appendix A).
Multilayer Perceptron
The MLP is a type of arti�cial neural network, throught which, by applying
an external signal to some binary on/o� inputs, is put into a de�ned state
that can be measured from the response of one or more binary outputs.
The basic unit of computation is a neuron, called Linear Threshold Unit
(LTU), in which each input connection is associated with a weight, which
tells the neuron to respond more to one input and less to another. The LTU
computes a weighted sum of its inputs and gives as output a non-linear func-
tion, called the Activation Function, that takes a single number and performs
a certain �xed mathematical operation on it: if the result exceeds a thresh-
old, it outputs the positive class or else the negative one.
The Perceptron is composed by a single layer of LTUs with each neuron con-
nected to all the inputs, plus a "bias neuron", which just puts 1 all the time,
(input layer) and by a single layer of LTUs that provide a response (output
layer). For every output neuron that produced a wrong prediction, it rein-
forces the connection weights from the inputs that would have contributed
to the connect prediction.
A Multilayer Perceptron borns to improve the Perceptron performance. It is
composed of input layer, one or more layers of LTUs, called hidden layers,
and one output layer.
The behaviour of the network is determined by the layout of the neurons, the
weights of the inter-neuron connections and by the response of the neurons
to the input, described by the neuron response function ρ, that maps the
neuron input onto the neuron output.
A MLP learns by means of an iterative algorithm, which computes the output
73
of every neuron in the net and measures the network's output error, i.e. the
di�erence between the desired output and the actual output of the network.
Its purpose is to minimizing this error, acting on the choice of the weights.
This algorithm is commonly called Back-propagation (BP): for each training
instance, the algorithm feeds it to the net and computes the output of every
neuron in each consecutive layer (this is the forward pass). Then it measures
the network's output error and computes how much each neuron in the last
hidden layer contributed to each output neuron's error. Then it proceeds to
measure how much of these error contributions came from each neuron in the
previous hidden layer and so on, until the algorithm reaches the input layer.
This reverse pass e�ciently measures the error gradient across all the con-
nection weights in the network by propagating the error gradient backward
in the network [47]. Mathematical details are in appendix A.
4.6.1 Signal and Background de�nition
In order to train the discriminators in a binary classi�cation problem one has
to provide background and signal samples with large statisics of events.
The signal in this context are the muons from τ → 3µ decay, taken from the
MC dataset, detailed in Section 4.3.
From the results shown in Section 4.4 is clear that main source of back-
ground are kaons, pions and protons. In order to provide a su�cient number
of background examples in the training phase, I generated two dedicated MC
datasets that so called "particle guns".
The particle gun samples were produced without pile-up conditions, through
PYTHIA8, for the following particle types:
� pions: π± → X, with 499950 events;
� kaons: K± → X, with 498000 events;
� protons: p/p, with 58440 events;
where X includes all the possible particle decay, according to their BR.
Each event in these samples consisted of two generator particles, one particle
and its corresponding anti-particle.
They are generated with a �at transverse momentum between 0 and 25 GeV
and a �at pseudorapidity between -2.5 and +2.5, as shown in �gure 4.7. In
accordance to the end point of the spectrum reported in �gure 4.3.
In order to cross-check the performance of the muon ID algorithms on these
samples, the fake rates are reported in �gure 4.8, as function of the kine-
matic variables. The e�ciency for muons coming from K or π decays is also
74
(a) (b)
Figure 4.7: η (a) and pT (b) distributions of kaons, pions and protons guns,
at generator level, at CMS acceptance.
(a) (b)
Figure 4.8: Soft, Loose and Tight ID fake rates vs η (a) and pT (b).
computed and the results are in �gure 4.9.
It can be noticed that this results are compatible with the ones showed in
�gure 4.5.
I performed the same study on the standard reconstruction algorithms, Global
muon and Tracker muon, since these variables will then be fed to the dis-
criminator. The results are reported in appendix B.
75
(a) (b)
Figure 4.9: Soft, Loose and Tight ID fake rates.
The following selections are imposed on signal and background:
- for signal sample:
. the reconstructed muon is matched at generator level with a genuine
muon and it comes from the decay of the tau lepton;
.. region of interest: pT < 6 GeV;
. distance in the transverse plane, dxy, for true muons < 2 cm to select
tracks that are compatible with the nominal interaction point.
- for background sample:
. The reconstructed muon does not match with a genuine muon at gen-
erator level;
.. distance in the transverse plane dxy < 2 cm.
Only muon candidates with pT < 6 GeV are considered in this study and
partecipate to the training for both signal and background.
4.6.2 Input Discriminating variables
A set of 24 variables is used to train the discriminator, chosen on the basis of
the discriminating power between genuine and fake muons. All the variables
76
are related to the capability to discriminate a minimum ionizing particle
(mip) with respect to an hadron that undergoes also nuclear interactions.
The variables can be divided into 5 groups:
1. Tracker-track reconstruction related variables
o Muon_trackerLayersWithMeasurement : number of tracker layers �red
by the track of the muon candidate that have been used in the �t of
the tracker track;
oo Muon_innerTrack_normalizedChi2 : χ2 of the inner track �t divided
by the number of degrees of freedom of the �t;
o Muon_Numberofvalidpixelhits : amount of valid hits in the pixel detec-
tor.
2. Muon-system reconstruction related variables
o Muon_isTracker : Tracker muon reconstruction �ag;
oo MuonTrkKink : kink algorithm 1 applied to the global muon's inner
track;
o MuonTrkRelChi2 : sum of χ2 estimates of the hits in the tracker with
respect to the Global muon track.
o Muon_numberOfMatchedStations : number of muon stations contain-
ing segments matched with a tracker track;
o Muon_outerTrack_normalizedChi2 : χ2 of the �t of the muon track as
it is reconstructed in the muon system (standalone muon), divided by
the number of degrees of freedom of the �t;
o Muon_outerTrack_muonStationsWithValidHit : number of stations in
the muon system with valid hits (> 0);
o Muon_segmentCompatibility : compatibility of the track as it is recon-
structed in the tracker with the muon hypothesis (∈ [0, 1]); it evalu-
ates which crossed stations have matching muon segments and assigns
a probability.
1The kink algorithm takes the di�erence in φ of the predicted track position and the
actual recHit, squares it and divides it by the error in φ of the recHit position. The �nal
value corresponds to the sum of these calculated values for all recHits.
77
3. Overall muon reconstruction quality variables
o Muon_isGlobal : Global muon reconstruction �ag;
oo Muon_QInnerOuter : product of the electric charge of the Tracker
track and the electric charge of the standalone muon track;
o Eta_di� = (Muon_outerTrack_eta - Muon_innerTrack_eta): di�er-
ence in η between track position in the muon system and in the tracker;
o Phi_di� = (Muon_outerTrack_phi - Muon_innerTrack_phi): di�er-
ence in Φ between track position in the muon system and in the tracker;
o Muon_combinedQuality_chi2LocalMomentum: a scalar resulting from
the matrix-vector product vTMv, where v corresponds to the di�erence
between the momentum vector of the Standalone muon track and the
Tracker track at the common surface and M corresponds to the related
error matrix;
o MuonChi2LocalPosition: a scalar resulting from the matrix-vector prod-
uct vTMv, where v corresponds to the di�erence in position of the state
of the trajectory on the surface of the �rst muon station for Standalone
muon and the Tracker track and M corresponds to the related error ma-
trix;
o MuonGlbTrackProbability : probability of the χ2 associated to the global
track re�t being larger than the one observed;
o Muon_combinedQuality_glbKink : kink algorithm, calculating the Tracker
kink applied on the Global track. In detail, I used this quantity as Log(2
+ Muon_combinedQuality_glbKink), in order to avoid a large loss of
muons after a pre-selection cut, due to the long tail of the distribution
of this variable;
o Muon_combinedQuality_globalDeltaEtaPhi : squared di�erence in η and
φ of the Standalone muon track and the Tracker track on the common
surface during track matching.
4. Timing information
78
o Muon_timeAtIpInOut : time of arrival (time of �ight) at the muon
system for muons moving inside-out, assuming β = 1;
oo Muon_timeAtIpInOutErr : uncertainty in the time of arrival at the
muon system for muons moving inside-out, assuming β = 1. This is
not the uncertainty of the measurement, but the sum of di�erences
between the measurements and the measurements weighted with their
uncertainties.
5. Calorimetric information
o Muon_calEnergy_had : energy deposits in HCAL, compatible with the
muon track both in the tracker and in the muon system.
In addition, also the η and φ variables are provided to the discriminator, in
order to exploit possible trends and correlations of the muon reconstruction
to a given subdetector or a particular spatial region in CMS.
Figure 4.10 shows the distribution of the main discriminating variables, for
signal (true muons) and background (fake muons) samples after the selections
reported in section 4.6.1: the blue histograms are the background and the
red ones the signal.
The linear correlation between the input variables has been evaluated and
the results can be visualized in �gure 4.11.
The two matrices are quite similar, except that, in the background sample,
the variables trkRelChi2 and innerTrack_normalizedChi2 are high correlated
to the TrkKink. However since this variables are highly discriminating and
the performance of the �nal output are not degraded by the correlation, they
are kept in the training.
The high correlation between Muon_segmentCompatibility and
Muon_numberOfMatchedStations is due to the fact that in the de�nition of
the segment compatibility the matched stations are included.
4.6.3 Algorithms setup and con�guration
The overtraining of a given method is very common in ML and happens when
either the number of features is too large or the model structure is not really
conform to the data shape or the model is trained on limited statistics: in
this case the outcome of the training that corresponds too closely to the set
of data in its input may fails to �t additional new data and reliably predict
79
(a) (b) (c)
(d) (e) (f)
(g) (h) (i)
Figure 4.10: Distribution of some input discriminating variables,
normalized to their area.
future observations.
One can intuitively understand overtraining from the fact that informa-
tion from all past experience can be divided into two groups: information
that is relevant for the future (or the prediction) and irrelevant information
("noise").
Possible solutions are: to simplify the model by selecting one with fewer pa-
rameters; to gather more training data; to reduce the noise in the training
80
(a)
(b)
Figure 4.11: Correlation matrix among input discriminating variables, for
the signal (a) and for the background (b), provided by TMVA.
data (e.g. �x data errors and remove outliers).
Therefore the con�guration of the BDT and MLP algorithms is chosen in or-
der to avoid the overtraining without losing performance. In tables 4.6 and
4.7 the parameters of MLP and BDT algorithms, as implemented in TMVA
tool, are reported respectively. The default con�guration and the �nal one
chosen after the optimization phase are reported.
In �gure 4.12 the architecture of the MLP network with all the chosen pa-
81
Table 4.6: MLP con�guration: the names of the parameters correspond to
the names used by the TMVA tool.
Parameters Default values Final values Description
NeuronType sigmoid tanh neuron activation function α
VarTransform None Normalisation
lists the variable transforma-
tions performed before train-
ing
NCycles 500 500
number of epochs: the num-
ber of training cycles necessary
to achieve a su�ciently good
training of the network
HiddenLayers 2 HL N, N-1 1 HL N+5
speci�cs the HL architecture:
the number of HL in a network
and the number of neurons in
these layers
TestRate 10 5
number of training cycles be-
tween two test; it is a test for
overtraining performed at each
#th epochs or better indicates
the test frequency
ConvergenceTests -1 1
number of subsequent con-
vergence tests which have to
fail to consider the training
to be completed, required for
convergence (< 0 means au-
tomatic convergence check is
turned o�)
rameters is shown.
4.6.4 Performance of the discriminators: Kolomogorov-
Smirnov test, response and ROC curve
In this section all results of the training phase are reported, comparing the
ROC curve, the output of the algorithms and the considerations about the
best method to be used. The training of the algorithms has been performed
merging the two background guns, pions and kaons, to increase the statistics.
As a general rule, a good way to know if an algorithm will work is to try it
82
Table 4.7: BDT con�guration: the names of the parameters correspond to
the names used by the TMVA tool.
Parameters Default values Final values Description
NTrees 800 1000 number of trees in the forest
MinNodeSize 5% 5%
minimum number of events (in
percentage) required in a leaf
node. The value is relative to
the total event sample size, i.e.
all events used in the trainingMaxDepyh 3 2 maximum depth of a tree
BoostType AdaBoost AdaBoost
boosting algorithm for the
trees in the forest: in order to
make the decision trees robust
against statistical �uctuations
of the training sample, boost-
ing, i.e. reweighting, is applied
to the training sample
AdaBoostBeta 0.5 0.5learning rate η for adaptive
boost algorithms
UseBaggedBoost False True
means that only a randomly
chosen sub-sample of the event
sample is used for boosting
BaggedSampleFraction 0.6 0.6
fraction of the sub-sample size
used for boosting relative to
the event sample size
SeparationType GiniIndex GiniIndexcriterion chosen for node split-
ting
nCuts 20 29
number of cut values in a grid
which is adapted to the vari-
able distribution, using in �nd-
ing optimal cut in node split-
ting. The value -1 invokes an
algorithm that tests all possi-
ble cuts on the training sample
and �nds the best one, but it
is slower than the coarse grid
83
Figure 4.12: The MLP network architecture from TMVA tool.
out on new datasets. The solution I have adopted is to split the data into two
sets: the training set to train model and the test set to test it. Evaluating
the model on these sets it is possible to obtain an estimation of the error of
the generalization of the model and of the goodness of the created discrim-
inator (see better appendix A). Then both signal and background samples
are divided in subsets and in particular 70% for training and 30% for test
resulted to be the optimal choice that ensures a good balance between learn-
ing capability and overtraining.
The shape of the two trained discriminators is shown in �gure 4.13, for signal
and background muons. Test and training are superimposed.
One of the problems that a ML algorithm can encounter, as said in the
previous paragraph, is the overtraining, or more simply a disagreement be-
tween the good performance on the training set and the bad performance on
the test set; in order to assess the compatibility between the discriminators
obtained in the test and training samples, a Kolomogorov-Smirnov (KS) test
is performed, which quanti�es this disagreement and provide a value of prob-
ability between 0 and 1: values close to zero are taken as indicating a small
probability of compatibility.
84
(a)
(b)
Figure 4.13: The MLP network (a) and BDT (b) response: training and
testing outputs are superimposed.
This test should in theory be used only for unbinned data and not for binned
data as in the case of the histogram, for which the test returns a non-uniform
distribution between 0 and 1. However ROOT provides the function to imple-
85
ment this statistical control 2. If by chance the distributions of two samples
under comparison are the same, then the value becomes 1.
The KS test performed on BDT and MLP, gave the following result:
MLP : KS sig (bkg) probability = 1 (0.993).
BDT : KS sig (bkg) probability = 0.999 (0.563).
It can be noticed that the BDT response performance are not optimal for
the background.
The discrimination performances of both the classi�ers are shown in the ROC
curves in �gure 4.14.
Figure 4.14: ROC curves. As a comparison, it is also reported the ROC
curve of the simplest training algorithm: the Linear Discriminant (see
Appendix A).
Since the ultimate goal of this study is to improve the discrimination of real
to fake muons, looking at the score for a given classi�er, a set of possibile
choices for a working point to be used as a selection is provided in table 4.8
for several background level contamination. The �rst one appears to be the
most interesting, since it ensure a fake rate around 1%.
2the values of probability for binned data will be shifted slightly higher than expected,
depending on the e�ects of the binning.
86
Table 4.8: Final performances.
MVA Method Signal e�ciency: from test sample (from training sample)
@B=0.01 @B=0.10 @B=0.30
MLP 0.975 (0.973) 1.000 (1.000) 1.000 (1.000)
BDT 0.978 (0.978) 1.000 (1.000) 1.000 (1.000)
The separation power (SP) is equal to 49.5, computed as the quotient of
signal e�ciency (ε) and fake rate (r), SP = ε/(1-r).
Even if the BDT reaches slightly higher values in e�ciency, at the same time
it shows an irregular pro�le (�gure 4.14), because it su�ers from overtraining,
in particular for the background dataset, as the KS test demonstrated. The
MLP shown much more robust performance and thus has been selected for
this analysis.
4.7 Validation in the control region Ds −→ φπ
and in the Minimum Bias events
In order to validate the performance of the MLP-based low momentum muon
identi�cation, e�ciency and fake rate are computed in dedicated control
regions, completely uncorrelated from the event samples used for the training.
The control region chosen to check the performance of signal muons is pp→DS → φπ, where the muons candidated come from φ → µµ. Indeed the
�nal state muons produced from this φ decay have a very soft pT spectrum
as the muons coming from supersimmetric light boson decays or from LFV
decays of charged leptons. On the other end, the production cross section
and BR involved in the DS → φπ → (2µ)π process are completely known
and well determined. So this process is used as a standard candle to probe
the e�ciency of the new algorithm.
I generated ∼151k events DS → φπ → (2µ)π, with PYTHIA 8, starting
from a Minimum Bias event con�guration and requiring at least one DS
meson per event. The DS → φπ and φ → µµ decays are then forced with
BR = 1. Standard CMS event reconstruction algorithm are then applied to
the �nal state particles as described in Section 4.2. The 2017 realistic data
taking conditions are simulated, e.g the pile up interactions are simulated
according to the distribution shown in �gure 2.14 (b).
87
A set of selections is applied on reconstructed events and variables, in order
to select a good di-muon candidate:
- 2 opposite sign (OS) muons with pT > 0.5 GeV and |η| < 2.4;
-- invariant mass of the di-muon system is required to fall in the φ mass
window 1 < mµµ < 1.04 GeV; while invariant mass of the triplet is
required to fall in the DS invariant mass peak 1.93 < mµµ+1tr < 2.01
GeV.
- at least one track with pT > 2 GeV, |η| < 2.4, charge 6= 0, dxy < 0.3
cm and dz < 20 cm;
- at least 1 triplet with mass compatible with the DS invariant mass
window, |charge| = 1 and χ2 of the vertex �t between 0 and 15.
If more than one triplet is found, the one with the smallest χ2 is chosen. The
invariant mass distribution before the selections on the mass windows are
shown in �gure 4.15.
(a) (b)
Figure 4.15: φ (a) and DS (b) invariant mass, before the selection in a mass
windows reported in the text.
The fake rate instead has been computed using a Minimum Bias MC dataset
and selecting events with at least one pion or kaon at generator level (by
requiring PdgId = 321 || PdgId = 211).
Figure 4.16 shows the shape of the discriminator in MC DS → φπ, evaluated
88
on the muons coming from the φ decay, and hadrons tagged as reconstructed
muon in MC MinimumBias.
Figure 4.16: Discriminator's shape in MC DS → φπ and MC
MinimumBias, for 2 muons.
I studied the e�ciency and fake rate of the standard identi�cation algorithms
(Soft, Loose and Tight ID) together with the e�ciency of the MLP-based low
momentum muon identi�cation. The results are plotted as function of pT ,
as shown in �gure 4.17.
Finally, in the table 4.9, the numerical values of these e�ciencies are re-
ported, for transverse momentum between 0 and 5 GeV.
It should be noted that MLP-based low momentum muon identi�cation
(MLP-Ultra-Soft muon ID in �gure) shows high e�ciency, comparable to
the one of the standard muon ID algorithms (Soft ID and Loose ID). In ad-
dition the fake rate obtained with the MLP-Ultra-Soft muon ID is reduced by
a factor of 12 for Loose ID and 25 for the Soft ID in the lowest pT bins (0.5-2
GeV). As expected the performance slightly degrade when the pT increases,
for pT > 4 GeV, since the new algorithm has been trained with a pT < 6
GeV. In particular the fake rate becomes comparable with Loose ID for pT> 4 GeV.
89
(a) (b)
(c) (d)
Figure 4.17: E�ciency vs pT for signal (MC DsPhiPi sample) and
background (MinimumBias sample).
In summary, the standard CMS Soft and Loose IDs, commonly used for
searches involving low-pT muons in the �nal state, show a very high e�-
ciency, close to 100%, for muons from LFV τ decay, but also a large fake
rate, about 40-50%, for hadrons coming from pile-up interactions.
The new algorithm MLP-Ultra-Soft muon ID is optimized to identify low-pTmuons (pT < 4 GeV), targeting a close to 100% e�ciency and O(10−2) fake
rate. Its performance have successfully been reached in the training phase,
and then cross-checked in independent control samples (DS → φπ, φπ → µµ
for signal and Minimum Bias events for background).
In particular, for pT < 4 GeV, the e�ciency of the new algorithm is compa-
90
Table 4.9: Summary of the e�ciency for all ID algorithms, used in this
study.
rable to the Loose and Soft IDs, while an average reduction of a factor 4 is
observed in the hadron → muon fake rate in the same region. In the lowest
pT bins (0.5-2 GeV), a fake rate reduction of a factor 12 is observed for Loose
ID and a factor of 25 for Loose ID.
91
Conclusion
An innovative algorithm for the low-momentum muon identi�cation in the
CMS experiment has been presented, using DS → τντ → 3µντ as a physics
study case.
The pT spectrum of the muons coming from this decay is particularly soft,
the most aboundant contribution coming from the range below 5 GeV. This
study identi�ed the light �avor mesons erroneously identi�ed as muons as a
relevant component of the background.
Despite the muon reconstruction and identi�cation standard algorithms shown
excellent performance for pT > 4 GeV, futher optimizations are needed in
the softer spectrum, in order to reach the same level of performance.
To this purpose, I have implemented a Machine Learning approach to dis-
tinguish signal muons from the background ones, exploiting the multivariate
analysis. This technique consists in combining a set of variables in order to
extract one single output, capable to classify signal and background, by ex-
ploting underlying patterns and correlations. The discriminators have been
trained, providing events simulated with Monte Carlo technique as an input
(muons from DS → τντ , τ → 3µ for the signal and custom pions and kaons
guns for the background). In this work, I compared a Boosted Decision Tree
and a Neural Network, choosing the latter for its better performances.
The performance of the new MLP-based low momentum muon identi�cation,
MLP-Ultra-Soft muon ID, have been measured in signal enriched and back-
ground enriched dedicated datasets, respectively DS → φπ and Minimum
Bias.
The new muon identi�cation algorithm shows for pT < 4 GeV, an e�ciency
comparable to the Loose and Soft IDs, while an average reduction of a factor
4 is observed in the hadron → muon fake rate. In the lowest pT bins (0.5-2
GeV), a fake rate reduction of a factor 12 is observed for Loose ID and a
factor of 25 for Loose ID.
92
The results obtained show that Maching Learning techniques are a promis-
ing alternative to conventional analysis. The algorithm proposed here will be
further optimized by introducing new variables, based on detector response
and quality reconstruction (e.g. cluster size of the hits, pull between the
measured hit and the one belonging to the �tted trajectory).
A further improvement during the training phase would come by exploiting
realistic physics processes (e.g. D+ → µ+νµK−π+), rather than the parti-
cle guns. Indeed a limitation in this study is the statistics available in the
Monte Carlo datasets used to train the discriminators, partially overcome by
privately generating them at the Bari Tier-2 computing center.
The results achieved in this study for the background suppression and pile-up
mitigation have been obtained in the Run 2 data taking conditions with on
average of 30 pile-up interactions per bunch crossing. These results allows
to develop a strategy to preserve the sensitivity of the CMS experiment in
the search for new physics involving the low momentum muons in the �nal
state. This will be extremely challenging in Run 3 data taking conditions
with 50 pile-up interactions and in particular in the high luminosity scenario,
characterized by up to 200 pile-up interactions per bunch crossing.
93
Appendix A
Introduction to Multivariate
Analysis and statistical learning
At LHC, and more generally in High Energy Physics (HEP), large quantities
of data are gathered and, since a lot of the studied process happens in a
tiny fraction of the collisions, more sophisticated techniques are required to
discriminate the event of interest (de�ned as signal) from the background:
the strategy is to use input information from multiple variables from various
sources thus performing the so-called Multivariate Analysis (MVA). In par-
ticular it becames useful in order to investigate very rare phenomena or to
reduce huge backgrounds.
The goal in this direction is the fusion between MVA tecniques and Machine
Learning (ML) algorithms which can automatize the analysis process.
A.1 De�nitions and basic concepts
The classical statistics is also known as univariate, looking at one variable at
a time; however in a statistical problem there's often more than one variable
involved, so univariate analysis can lead to the wrong conclusions. It is often
necessary to study or measure more than one variable simultaneously to un-
derstand a process or any set of samples with numerous measurements: the
Multivariate Analysis is a statistical analytical approach that simultaneously
evaluates multiple input variables or features, called predictors to provide
one output variable, called response [48]; essentially, it is a tool to predict
the e�ect a change in one variable will have on other variables.
Machine Learning, or statistical learning, is a group of multivariate analytic
94
methods that train a data sample in order to make predictions on unknown
datasets.
From a more strictly mathematical point of view, suppose that we observe
a quantitative response Y and p di�erent predictors, X1, X2, ..., Xp. We
assume that there is some relationship between Y and X = (X1, X2, ..., Xp),
which can be written in the very general form as
Y = f(X) + ε
Here f is some �xed but unknown function of X1, X2, ..., Xp and ε is a ran-
dom error term, which is indipendent of X and has mean zero; f represents
the systematic information that X provides about Y [48].
In essence, statistical learning refers to a set of approaches for estimating f .
The aim of the ML tecniques is the application of a statistical learning meth-
ods to training data in order to estimate the unkonown function f , minimiz-
ing the reducible error, due to the fact that it is possible to do only an
estimate of f .
A.1.1 Supervised and Unsupervised learning
The ML methods can be classi�ed into supervised and unsupervised. In the
�rst one, the training data that feed the algorithm include the desired solu-
tions, called labels: the learning happens by examples, i.e. the algorithm
extracts patterns from training data.
In contrast unsupervised learning describes the situation in which for every
observation we have a vector of measurements but no associated response.
It is re�ered to as unsupervised because we lack a response variable that can
supervise our analysis[48]. This last one type is not common in HEP (High
Energy Physics) and not covered in this chapter.
Typical supervised learning tasks are the classi�cation and regression.
A.1.2 Classi�cation vs Regression
The features can be characterized as either quantitative or qualitative. Quan-
titative variables take on numerical values; in contrast, qualitative ones take
on values in one of several classes or categories. We tend to refer to problems
with a quantitative response as regression problems, while those involving a
95
qualitative response are often referred to as classi�cation problems.
In the classi�cation task, the algorithm is trained with dataset that contain
events of signal and background (will restrict here to two class case, but many
classi�ers can in principle be extended to several classes) and it must learn
how to classify new datasets. For example in �gure A.1 is shown a simple
response of a classi�cation's algorithm in two dimensions. A possible solution
(without any misclassi�cation) is shown by the curved decision boundary and
any new event given by two coordinates would be classi�ed according to the
two sides of this boundary.
Figure A.1: A simple 2D example for a classi�cation problem. The circles
symbolise the �signal�, events with Y = 1, the squares stand for the
�background�, events with Y = 0 [47].
In the regression task, the value of some quantity of interest, called target
value is predicted, given a set of input features. A one-dimensional regression
problem is shown in �gure A.2. The seven crosses represent the data points
(�examples�) and the smooth curve may be a solution formed by a statistical
learning method. Any new event given by an x-coordinate will result in a
y-coordinate output according to the learned curve.
A.1.3 How do one choose a statistical learning method?
In statistics, no one method dominates all others over all possible data sets:
selecting the best approach can be one of the most challenging parts of per-
forming statistical learning in practice [48].
Firstly a method is selected on the basis of whether the response is qualitative
96
Figure A.2: A simple one-dimensional example for a regression problem [47].
or quantitative. Since the study of this thesis is based on the classi�cation's
algorithms, we will concentrate on the qualitative setting.
It possible for example to evaluate the performance of a classi�er, in which
histograms of the output distributions for signal and background form the
basis. Well designed classi�cation methods do not only give outputs 0 or 1,
but they give continuous values in the interval [0, 1] which could be inter-
preted as a probability. A value of 0.5 then means that this event could be
either class with almost the same probability.
The great advantage of a continuous output between 0 and 1 shows up when
a cut or threshold is de�ned to do the actual classi�cation. The signal e�-
ciency ε is then given by the percentage of recognised �good� events (output
> cut) and the background rejection r (1- bakground e�ciency) is given by
the percentage of recognised �bad� events (output < cut) [47].
How to choose the best threshold? The ROC curve is a popular tool for dis-
playing the results for all possible cuts in the e�ciency/rejection plane. The
name "ROC" is historic and cames from communications theory, as acronym
for receiver operating characteristics [?]. In �gure A.3 there is an example of
ROC curve: starting with no rejection and 100% e�ciency for cut = 0 and
ends at 100% rejection and no e�ciency for cut = 1.
The dotted line (ε + r = 1) represents the ROC curve of a purely random
classi�er or pre-scaling. A good classi�ers stays as far away from that line as
possible, toward to top right corner with full e�ciency and full rejection.
The overall performance of a classi�er, summarized over all possible thresh-
olds, is given by the area under the curve (AUC): a perfect classi�er will have
a AUC = 1, whereas a purely random classi�er will have AUC = 0.5 [49].
97
Figure A.3: An example of ROC curve: at the marked point an e�ciency of
97% is achieved with a cut at 0.23 and results in a background rejection of
80%. The separation power is 4.9 [47].
The larger AUC the better classi�er.
ROC curves are useful for comparing di�erent classi�ers, since they take into
account all possible thresholds: the several curves may be compared depend-
ing on the application and the separation power, calculated as the quotient
of signal and background e�ciency SP = ε1−r [47]. These criterions �xes a
speci�c cut and sets thus a �working point� for the classi�er.
A.1.4 Problems
Since the main task is to select a learning algorithm and train it on some
data, the two things that can go wrong are "bad algorithm" and "bad data".
A �rst problem concerns insu�cient quantity of training data: the algorithm
needs thousand of examples to learn and it is crucial that the training data
be representative of the new cases we want to generalize to.
If the sample is too small, we will have sampling noise (i.e., nonrepresentative
data as a result of chance), but even very large samples can be nonrepresen-
tative if the sampling method is �awed. This is called sampling bias [49].
A critical part of the success of a ML project arise also from a good set of
features to train on. This process, called feature engineering [49], involves:
o Feature selection: selecting the most useful features to train on among
existing features.
o Feature extraction: combining existing features to produce a more use-
98
ful one.
o Creating new features by gathering new data.
The only way to know how well a model will generalize to new cases is to
actually try it out on new cases. A better option is to split the data into two
sets: the training set to train model and the test set to test it. The error rate
on new cases is called the generalization error, and by evaluating the model
on the test set, we get an estimation of this error. This value tells how well
model will perform on new instances.
Monte Carlo simulations are a standard way of generating training data
examples. But they must be used with care: both underlying physics and
the detector response have to be understood very well to create a simulation
which generates events matching the experimental observations. Even very
small deviations, correlations or deviations that exist in the simulation but
not in reality may result in a trained method that handles simulated events
perfectly, but shows a behaviour like random guessing on real data [47].
Over�tting and Under�tting
If the training error is low (i.e. the model makes few mistakes on the training
set) but the generalization error is high (i.e. the model does not generalize
well), it means that the model is over�tting the training data [49].
At the opposite side, there is the under�tting phenomenon: it occurs when
the model is too simple to learn the underlying structure of the data. The
main options to �x this problem are: selecting a more powerful model, with
more parameters; feeding better features to the learning algorithm.
Summarizing: the system will not perform well if the training set is too
small, or if the data is not representative, noisy, or polluted with irrelevant
features. The model needs to be neither too simple (it will under�t) nor too
complex (it will over�t) [49].
A.1.5 Data Preprocessing
Preprocessing data means transforming inputs x which are directly measured
by the detector into new inputs x' which are better suited to describe the
event in one or more of the following senses:
99
- the transformed inputs may make use of prior knowledge;
- the transformation may re�ect a symmetry that is inherent to all events;
- if the input space is very high-dimensional and it is unknown how to reduce
the dimensionality1, a transformation based on automatic procedures can be
very helpful [47].
Preprocessing can be useful to reduce correlations among the variables, to
transform their shapes into more appropriate forms, or to accelerate the re-
sponse time of a method.
For supervised methods there are 5 main variable transformation methods:
normalisation, decorrelation, PCA and gaussianisation. For more details see
Ref. [46].
A.2 Classi�cation methods
There are many possible classi�cation techniques or classi�ers, that one
might use to predict a qualitative response, but I will touch only the methods
that I used in my analysis, starting from the simplest (Linear Discriminant),
used only for a comparison in the performances, to the two more computer-
intensive methods (Boosted Decision Tree and Multilayer Perceptron).
A.2.1 Linear Discriminant Analysis
The linear discriminant analysis (LD) algorithm uses a linear model, where
linear refers to the discriminant function y(x), linear in the parameters β:
y(x) = x>β + β0
where β0 (bias) is adjusted so that y(x) ≥ 0 for S and y(x) < 0 for B [46].
1Many ML problems involve thousands or even millions of features for each training
istance. Not only does this make training extremely slow, it can also make it much harder
to �nd a good solution: it talks about curse of dimensionality. Many things behave very
di�erently in high-dimensional space: the more dimensions the training set has, the greater
the risk of over�tting it [49].
100
Description and implementation
Assuming that there are m+1 parameters β0, ..., βm to be estimated using a
training set comprised of n events, the equation for β is: Y = Xβ, where
Y =
y1y2...
yn
and X =
1 x11 · · · x1m1 x21 · · · x2m...
.... . .
...
1 xn1 · · · xnm
and the constant column in X represents the bias β0, absorbed into the vector
β, and Y is composed of the target values with yi = 1 if the ith event belongs
to the S class and yi = 0 if the ith event belongs to the B.
Applying the method of least squares, we now obtain the normal equations
for the classi�cation problem, given by
XTXβ = XTY ⇐⇒ β =(XTX
)−1XTY
If weighted events are used, this is simply taken into account by introducing
a diagonal weight matrix W and modifying the normal equations as follows:
β =(XTWX
)−1XTWY
Considering two events x1 and y2 on the decision boundary, we have y(x1)
= y(x2) = 0 and hence (x1 - x2 )Tβ = 0. Thus we see that the LD can be
geometrically interpreted as determining the decision boundary by �nding
an orthogonal vector β [46].
Variable ranking
This implementation of LD provides a de�nition of input variable importance,
or ranking, based on the coe�cients of the variables in the linear combination
that forms the decision boundary. The order of importance of the variables
is assumed to agree with the order of the absolute values of the coe�cients.
Performance
The LD is optimal for Gaussian distributed variables with linear correla-
tions: no discrimination is achieved when a variable has the same sample
mean for signal and background, but the LD can often bene�t from suitable
transformations of the input variables.
101
A.2.2 Boosted Decision Tree
A decision tree is a binary tree classi�er similar to the one in �gure A.4.
Figure A.4: Schematic view of a decision tree [46].
Starting from the root node, a sequence of recursive binary splits using the
discriminating variables xi is applied to the data. Each split uses the variable
that at this node gives the best separation between signal and background
when being cut on. The same variable may thus be used at several nodes,
while others might not be used at all. The node splitting stops once it has
reached the minimum number of events, the maximum number of nodes and
the maximum depth, all speci�ed in the BDT con�guration. The leaf nodes
at the bottom end of the tree are labeled �S� and �B� depending on the
majority of events that end up in the respective nodes [46].
Description and implementation
The decision tree is able to split the phase space into a large number of
hypercubes or regions, each of which is identi�ed as either �signal-like� or
�background-like�. The buidling is not linear.
Each observation belongs to the most commonly occurring class/region of
training observations in the region to which it belongs. To task of growing a
such structure a criterion, called classi�cation error rate, is used: it is de�ned
as the fraction of the training observations in that region that do not belong
102
to the most common class[48].
However it turns out that classi�cation error is not su�ciently sensitive for
tree-growing: two other measures are preferable. One is Gini index, a mea-
sure of total variance across the K classes or better is re�ered to a mesure
of node purity : a small value indicates that a node contains predominantly
observations from a single class; so gini = 0 means that a node is pure. The
other measure is the cross-entropy that takes on a small value if the mth
node is pure. They are very similar: Gini is slightly faster to compute and
tends to isolate the most frequent class in its own branch of the tree; entropy
tends to produce slightly more balanced trees [49].
A shortcoming of the decision trees is their instability with respect to sta-
tistical �uctuations in the training sample from which the tree structure is
derived. For example, if two input variables exhibit similar separation power,
a �uctuation in the training sample may cause the tree growing algorithm
to decide to split on one variable, while the other variable could have been
selected without that �uctuation. In such a case the whole tree structure is
altered below this node, possibly resulting also in a substantially di�erent
response.
This problem is overcome by constructing a multiple decision trees (forest)
and classifying an event on a majority vote of the classi�cations done by
each tree in the forest, using bagging, randomising and boosting techniques.
The trees are derived from the same training ensemble by reweighting events,
and are �nally combined into a single classi�er which is given by a (weighted)
average of the individual decision trees [46].
Bagging
The decision tree so far presented has the inconvenient of high variance, that
means, if we split the training data into two parts at random and �t a de-
cision tree to both halves, the results could be quite di�erent. In contrast a
procedure with low variance will yield similar results if applied repeatedly to
distinct data sets. Bagging is a general purpose procedure for reducing the
variance of a statistical learning method: given a set of n indipendent obser-
vation each with variance σ2, the variance of the mean is σ2/n; so averaging
a set of observations reduce variance [48].
It is a resampling technique where a classi�er is repeatedly trained using re-
sampled di�erent training events such that the combined classi�er represents
an average of the individual classi�ers. Resampling includes the possibility
103
of replacement, which means that the same event is allowed to be (randomly)
picked several times from the parent sample. The algorithm will predict �nal
answer via simple majority voting: the overall prediction is the most com-
monly occurring class among the predictions.
Training several classi�ers with di�erent resampled training data and com-
bining them into a collection, results in an averaged classi�er that is more
stable with respect to statistical �uctuations in the training sample [46].
Random forest
Di�erently to bagging, with random forest method each tree is grown using
only one (resampled) subset of the original training events and at each split
a random subset is considered as split candidates from the set of predictors.
The main di�erence between bagging and random forests is the choice of
predictor subset size. Using a small value of size in building a random forest
will typically be helpful when we have a large number of correlated predictors
[48]. Suppose that there is one very strong predictor in the data set, along
with a number of other moderately strong predictors, then in the collection
of bagged trees, most or all of the trees will use this strong predictor in the
top split. Consequently all of the bagged trees will look quite similar to each
other. Hence the predictions from the bagged trees will be highly correlated
and unfortunately, averaging many highly correlated quantities does not lead
to as large of a reduction in variance as averaging many uncorrelated quan-
tities. This means that bagging will not lead to a substantial reduction in
variance over a single tree. Random forests overcome this problem by forcing
each split to consider only a subset of the predictors: so an average of the
splits will not even consider the strong predictor and other predictors will
have more of a chance. We can think of this process as decorrelating the trees.
Boosting
Boosting works in a similar way of bagging, except that the trees are grown
sequentially: each tree is built using information from previously grown trees.
In this case each tree is �t on a modi�ed version of the original data set [48].
It says that boosting approach learns slowly and in general statistical learn-
ing approaches that learn slowly tend to perform well.
Boosting has 3 tuning parameters: the number of trees (unlike bagging and
random forest, boosting can over�t if this number is too large); the shrinkage
parameter (controls the rate at which boosting learns, typical 0.01 or 0.001
depending on problem); the number of splits in each tree, which controls the
104
complexity of the boosted ensemble.
There are two popular boosting methods: adaptive and gradient boost.
Adaptive Boost (AdaBoost) is the most popular: events that were misclassi-
�ed during the training of a decision tree are given a higher event weight in
the training of the following tree. Starting with the original event weights
when training the �rst decision tree, the subsequent tree is trained using a
modi�ed event sample where the weights of previously misclassi�ed events
are multiplied by a common boost weight α, and so on. The algorithm stops
when the desired number of predictors is reached [46].
The boost weight is derived from the misclassi�cation rate, err, of the pre-
vious tree
α =1− err
err
The weights of the entire event sample are then renormalised such that the
sum of weights remains constant. We de�ne the result of an individual clas-
si�er as h(x), encoded for signal and background as h(x) = +1 and - 1,
respectively. The boosted event classi�cation yBoost(x) is then given by
yBoost(x) =1
N collection·N collection∑
i
ln (αi) · hi(x)
where the sum is over all classi�ers in the collection. Small (large) values for
yBoost(x) indicate a background-like (signal-like) event.
Gradient Boost : is a simple additive expansion approach. It works by subse-
quentially adding predictors to an ensemble, each one correcting its predde-
cessor. However instead of modifying the istance weights at every iteraction
like AdaBoost, it tries to �t the new predictor to the residual error made by
the previous predictor [49].
Its robustness can be enhanced by reducing the learning rate.
Variable ranking
A ranking of the BDT input variables is derived by counting how often the
predictors are used to split decision tree nodes, and by weighting each split
occurrence by the separation gain-squared it has achieved and by the number
of events in the node (mesuring the Gini index). This measure of the variable
importance can be used for a single decision tree as well as for a forest [46].
105
Performance
Depending on the problem, if the relationship between the features and the
response is well approximated by a linear model, then an approach such
the traditional classi�cation algorithm, as LD, will work well. If instead
there is a highly non-linear and complex relationship then decision trees may
outperform classical approaches [48].
Decision trees are also insensitive to the inclusion of poorly discriminating
input variables. While for arti�cial neural networks, as we will see, it is
typically more di�cult to deal with such additional variables, the decision
tree training algorithm will basically ignore non-discriminating variables as
for each node splitting only the best discriminating variable is used. However,
the simplicity of decision trees has the drawback that their theoretically best
performance is generally lower with respect to other techniques like neural
networks; morover it is more prone to overtraining.
A.2.3 Multilayer Perceptron
The Arti�cial Neural Network (ANN) in the data analysis were �st intro-
duced by the neurophysiologist W. McCulloch and the matematician W.
Pitts [49]. It is a computational model vaguely inspired by the biological
neural connections that constitute a human brain, speci�cately designed to
non-linear learning problems and capable to compute any logical proposition
we want.
By applying an external signal to some binary on/o� inputs the network is
put into a de�ned state that can be measured from the response of one or
more binary outputs. One can therefore view the neural network as a map-
ping from a space of input variables x1, ..., xn onto a one-dimensional (e.g. in
case of a signal vs background discrimination problem) or multi-dimensional
space of output variables y1, ..., ym [46].
There are several type of ANN algotirhm, but I will discuss here only the
one used in this work: Multilayer Perceptron (MLP).
Description and implementation
The basic unit of computation in a neural network is the neuron, often called
a node or unit: the Perceptron, invented in 1957 by Rosenblatt [49], is a
particular arti�cial neuron, called Linear Threshold Unit (LTU) (see �gure
A.5): the input and output are numbers and no more binary values, and
106
each input connection is associated with weight, which tells the neuron to
respond more to one input and less to another.
Figure A.5: An exemple of Linear threshold unit [49].
The LTU computes a weighted sum of its inputs (z = w1x1 + w2x2 + · · · +wnxn = wTx), then applies a step function (commonly the heaviside function
or the sign function) to that sum and gives as output a non-linear R 7→ Rfunction, called the Activation Function:
hw(x) = step(z) = step(wTx
)The purpose of the activation function is to introduce non-linearity into the
output of a neuron. This is important because most real world data is non
linear and we want neurons to learn these non linear representations. There
are several activation functions we may encounter in practice: linear, sig-
moid, tanh or radial, but tanh and sigmoid are the most used.
A single LTU can be used for simple linear binary classi�cation: it computes
a linear combination of the inputs and if the result exceeds a threshold, it
outputs the positive class or else the negative one.
Training a LTU means �nding the right values for wi.
A Perceptron is simply composed by a single layer of LTUs with each neuron
connected to all the inputs: this connections are often represented using spe-
cial passthrough neurons called input neurons, which just output whatever
input they are fed. Morover an extra bias feature is generally added (x0 =
1), which is typically represented using a special type of neuron called bias
neuron, which just puts 1 all the time. The main function of Bias is to pro-
vide every node with a trainable constant value (in addition to the normal
inputs that the node receives). In �gure A.6 is shown e.g. a Perceptron with
107
2 inputs and 3 outputs: it can classify istances simultaneously into three
di�erent binary classes, which makes it a multi-ouput classi�er.
Figure A.6: An exemple of perceptron diagram [49].
Generally the Perceptron is trained in this way: it is fed one training instance
at a time and for each instance it makes its predictions. For every output
neuron that produced a wrong prediction, it reinforces the connection weights
from the inputs that would have contributed to the connect prediction.
To improve the Perceptron performance, a multi-layer structure has been in-
troduced: it is composed of one passthrough input layer, one or more layers
of LTUs, called hidden layers and one �nal layer of LTUs called the output
layer. Except the latter, every layer includes a bias neuron and is fully con-
nected to the next layer (see �gure A.7): this simple structure is a MLP.
When an ANN has two or more hidden layers it is called Deep Neural Net-
work (DNN).
The behaviour of the network is determined by the layout of the neurons, the
weights of the inter-neuron connections and by the response of the neurons to
the input, described by the neuron response function ρ that maps the neuron
input i1, ..., in onto the neuron output.
In a simple MLP we can change the number of layers, the number of neurons
per layer, the type of activation function to use in each layer and the weight
inizialization logic: what is the best combination of these parameters?
A practical rule is: for a MLP a single hidden layer is su�cient to approxi-
mate a given continuous correlation function to any precision, provided that
a su�ciently large number of neurons is used in the hidden layer. If the
available computing power and the size of the training data sample su�ce,
one can increase the number of neurons in the hidden layer until the opti-
108
Figure A.7: Multilayer perceptron with one hidden layer [46]. The yji are
the label of the target class of a given sample.
mal performance is reached. It is likely that the same performance can be
achieved with a network of more than one hidden layer and a potentially
much smaller total number of hidden neurons. This would lead to a shorter
training time and a more robust network [46].
In this analysis 1HL with large number of neurons is chosen, because in-
creasing the number of HL, the shape fo the disciminator improved but the
algorithm's performance get worse.
A Multi-layer Perceptron learns by means of an iterative algorithm, which
computes the output of every neuron in the net and measures the network's
output error, i.e. the di�erence between the desired output and the actual
output of the network. Its purpose is to minimizing this error, acting on the
choice of the weights. This algorithm is commonly called Back-propagation
(BP); to reduce the number of iterations it's possible to use a variant of the
method, labeled BFGS.
Back-propagation (BP)
It is the most common algorithm for adjusting the weights that optimise the
classi�cation performance of the neural network. In details, the output of
109
a network (here for simplicity assumed to have a single hidden layer with
a Tanh activation function, and a linear activation function in the output
layer) is given by:
yANN =
nh∑j=1
y(2)j w
(2)j1 =
nh∑j=1
tanh
(nvar∑i=1
xiw(1)ij
)· w(2)
j1
where nvar and nh are the number of neurons in the input layer and in the
hidden layer, respectively, w(1)ij is the weight between input-layer neuron i
and hidden-layer neuron j, and w(2)j1 is the weight between the hidden-layer
neuron j and the output neuron.
During the learning process the network is supplied with N training events
xa = (x1, . . . , xnvar)a, a = 1, ..., N. For each training event a, the neural
network output yANN,a is compared to the desired output ya ∈ {1, 0}.An error function E, measuring the agreement of the network response with
the desired one, is de�ned by
E (x1, . . . ,xN |w) =N∑a=1
Ea (xa|w) =N∑a=1
1
2(yANN,a − ya)2
where w denotes the ensemble of adjustable weights in the network. The set
of weights that minimises the error function can be found using the method of
steepest or gradient descent(*), provided that the neuron response function
is di�erentiable with respect to the input weights. Starting from a random
set of weights w(ρ) the weights are updated by moving a small distance in
w-space into the opposite direction of the gradient, −∇wE:
w(ρ+1) = w(ρ) − η∇wE
where the positive number η is the learning rate, that is choosen carefully to
equilibrate the velocity of learning and the risk of failure in the research of
global minimum of the cost function.
The weights connected with the output layer are updated by
∆w(2)j1 = −η
N∑a=1
∂Ea
∂w(2)j1
= −ηN∑a=1
(yANN,a − ya) y(2)j,a
and the weights connected with the hidden layers are updated by
∆w(1)ij = −η
N∑a=1
∂Ea
∂w(1)ij
= −ηN∑a=1
(yANN,a − ya) y(2)j,a
(1− y(2)j,a
)w
(2)j1 xi,a
110
where we have used tanh′ x = tanhx(1− tanhx).
(*) Gradient descent is an optimization algorithm capable of tweaking parameters iteratively in
order to minimize the cost function (the distance between the model's predictions and the training exam-
ples): it measures the gradient of the error function with regards to a parameter vector θ, which represents
the weight, typically random: it goes in the direction of descending gradient. Concretely you start by
�lling θ with random values and then you improve it gradually, taking one small step at time, attempting
to decrease the cost function, until the algorithm converges to a minimum (see �gure A.8).
Figure A.8: Gradient descent [48].
An important parameter is the size of step, depending on learning rate. If η is too small, then the algo-
rithm will have to go through many iterations to converge, which will take a long time; if η is too high you
might jump across the valley and end up on the other side, possibly even higher up than you were before:
this might make the algorithm diverge, with larger and larger values failing to �nd a good solution.
When using Gradient Descent tou should ensure that all features have similar scale or else it will take
much longer to converge.
A.2.4 BFGS
The Broyden-Fletcher-Goldfarb-Shannon (BFGS)method di�ers from
BP by the use of second derivatives of the error function to adapt the synapse
weight by an algorithm which is composed of four main steps:
1. Two vectors are calculated: the vector of weight changes D that repre-
sents the evolution between one iteration of the algorithm (k-1) to the
next (k) and the vector Y, the vector of gradient errors.
111
2. Approximate the inverse of the Hessian matrix, H−1 , at iteration k by
H−1(k) =D ·DT ·
(1 + Y T ·H−1(k−1) · Y
)Y T ·D
−D·Y T ·H+H·Y ·DT+H−1(k−1)
3. Estimate the vector of weight changes by
D(k) = −H−1(k) · Y (k)
4. Compute a new vector of weights by applying a line search algorithm,
in which the error function is locally approximated by a parabola. The
algorithm evaluates the second derivatives and determines the point
where the minimum of the parabola is expected. The algorithm then
evaluates points along the line de�ned by the direction of the gradient
in weights space to �nd the absolute minimum. The weights at the
minimum are used for the next iteration.
The advantage of the BFGS method compared to BP is the smaller number
of iterations. However, because the computing time for one iteration is pro-
portional to the squared number of synapses, large networks are particularly
penalised [47].
Variable ranking
The MLP neural network implements a variable ranking that uses the sum
of the weights-squared of the connections between the variable's neuron in
the input layer and the �rst hidden layer. The importance Ii is given by
Ii = x2i
nh∑j=1
(w
(1)ij
)2, i = 1, . . . , nvar
Performance
Its main characteristics with respect to BDT are: slow training; expect higher
bias; expect high performance.
A.3 TMVA: Toolkit for Multivariate Analysis
It is an integrated environment of ML, with the possibility to access to a
wide number of multivariate classi�cation or regression algorithms. All mul-
tivariate techniques in TMVA belong to the family of �supervised learning�
112
algorithms. Here the samples in a dataset are called events; di�erent type
of variables are included in a speci�c �le supported by ROOT, called Tree
with .root extension; all informations about the dataset are included in the
DataLoader and they are passed to a Factory object that organises the
interaction between the user and the TMVA analysis steps. It performs pre-
analysis and preprocessing of the training data to assess basic properties of
the discriminating variables, then books, trains and tests the classi�cation
methods selected. Each MVA trained method writes its con�guration and
response in a result (�weight�) �le, which in the default con�guration has
readable XML format.
From this phase a .root �le is produced, which is read directly by means a
TMVA function (TMVA::TMVAGui), which allow to display a lot of informa-
tions: input variables distributions, the variables correlations, the e�ciency
of the classi�er's cut, the ROC curves and some important visual result about
the booked classi�ers (for example the BDT diagram or neural network ar-
chitecture).
Then to validate the developped algorithm Reader object is used: it reads
and interprets the weight �les, which can be included in any C++ executable,
ROOT macro, or python analysis job[46].
113
Appendix B
Performance of muons
reconstruction algorithm
In this Appendix are reported all results of the standard reconstruction al-
gorithms, Global and Tracker, as mentioned in Section 4.6.1.
B.1 Signal and Background de�nition in train-
ing phase
Figures B.1, B.2 and B.3 show the results about the e�ciency of signal muons,
fake muons and εK,π→µµ, respectively.
The study of the background performance for reconstruction algorithm is
reported in the �gures B.4 and B.5, for fake rate and muon from K or π
decay respectively.
Starting from the Global algorithm, it is possible to observe that in the
endcap region (η > 1.4) the muons from K are the most e�cient with 86-
88%, the muons not from K have 70-83% and the minimum contribution
is provided from non-muons with 0.8-1% of e�ciency; indead in the barrel
region (η < 1.4), where the e�ciency are all very high between 80-99%: the
value equal to 100% are a consequence of the same value of numerator and
denominator! Observing the behaviour in function of pT , it's clear that the
muons from K rise up to the plateau already to 3.5 GeV and the minimum
contribution is provided by non-muons. However in general for value greater
than 5 GeV there are higher e�ciency value!
The Tracker algoritms has a reverse pro�le, with a decrease in the barrel
114
(a) (b)
Figure B.1: Global and Tracker e�ciency vs η (a) and pT (b), for muons
from τ → 3µ decay.
(a) (b)
Figure B.2: Global and Tracker fake rates vs η (a) and pT (b).
region: it means that in the barrel, when there is a drop, it is more di�cult
to reconstruct muon from K, for example, than the Global method. The
reason could be due to the fact that Global in the barrel has also a greater
number of events than Tracker, so probably all of these muons born from
the interaction of the Kaon with the absorber between the HCAL and DT.
They therefore stay only in the muon system and not in the tracker. To
con�rm these hypotesis, I observed the coordinates of the primary vertex in
115
(a) (b)
Figure B.3: Global and Tracker vs η (a) and pT (b), for muons from K or π
decay.
(a) (b)
Figure B.4: Global and Tracker muon fake rate as function of η (a) and pT(b), for guns.
the transverse plane, to understand if these muons are created closer to the
bunch crossing or if they are produced forward and so come from the muon
system or not. For the muons from K, I found a dxy distribution with a tail
to 2 cm. Morover the study of the variable simFlavour, which stores the
�avour of the muon, con�rmed that these muons weren't primary muons.
116
(a) (b)
Figure B.5: Global and Tracker muon of muon coming from K or π decay
as function of η (a) and pT (b), for guns.
B.2 Validation phase results
The e�ciency of the standard reconstruction algorithms, obtained in the
application phase are reported in �gures B.6 and B.7.
(a) (b)
Figure B.6: E�ciency of the Global (a) and Tracker (b) algorithms vs η for
signal (DsPhiPi sample) and background (MinimumBias sample).
The e�ciency signal and the fake rates are summarized in the table B.1.
117
(a) (b)
Figure B.7: E�ciency of the Global (a) and Tracker (b) algorithms vs pTfor signal (DsPhiPi sample) and background (MinimumBias sample).
Table B.1: Summary of the e�ciency for all RECO algorithms, used in this
study.
118
Bibliography
[1] De Angelis Alessandro and Pimenta Mário João Martins. Introduction
to particle and astroparticle physics: multimessenger astronomy and its
particle physics foundations; 2nd ed. Undergraduate lecture notes in
physics. Springer, Cham, 2018.
[2] O. Nachtmann. Elementary Particle Physics: Concepts and Phenomena.
1990.
[3] James William Rohlf. Modern Physics from A to Z. John Wiley and
Sons, New York, 1994.
[4] M. Tanabashi and K. at al. Hagiwara. Review of particle physics. Review
of particle physics. Phys. Rev. D, 98:030001, Aug 2018.
[5] TWiki. Summaries of CMS cross section measurements, 2019.
[6] Measurements of properties of the Higgs boson decaying into four leptons
in pp collisions at sqrts = 13 TeV. Technical Report CMS-PAS-HIG-16-
041, CERN, Geneva, 2017.
[7] Georges Aad et al. Evidence for the Higgs-boson Yukawa coupling to
tau leptons with the ATLAS detector. JHEP, 04:117, 2015.
[8] A. M. Sirunyan et al. Observation of Higgs boson decay to bottom
quarks. Phys. Rev. Lett., 121(12):121801, 2018.
[9] Albert M Sirunyan et al. Combined measurements of Higgs boson cou-
plings in proton�proton collisions at√s = 13TeV. Eur. Phys. J.,
C79(5):421, 2019.
[10] Albert M Sirunyan et al. Observation of electroweak production of
same-sign W boson pairs in the two jet and two same-sign lepton �-
119
nal state in proton-proton collisions at√s = 13 TeV. Phys. Rev. Lett.,
120(8):081801, 2018.
[11] Albert M Sirunyan et al. Evidence for the associated production of a
single top quark and a photon in proton-proton collisions at√s = 13
TeV. Phys. Rev. Lett., 121(22):221802, 2018.
[12] D. Contardo, M. Klute, J. Mans, L. Silvestris, and J. Butler. Technical
Proposal for the Phase-II Upgrade of the CMS Detector. 2015.
[13] John Butler. Searches for Dark Matter at the LHC. Dark matter searches
at the LHC. Technical Report ATL-PHYS-PROC-2018-055, CERN,
Geneva, Jun 2018.
[14] Roel Aaij et al. Search for Dark Photons Produced in 13 TeV pp Colli-
sions. Phys. Rev. Lett., 120(6):061801, 2018.
[15] A. Alavi-Harati et al. Observation of direct CP violation in KS,L → ππ
decays. Phys. Rev. Lett., 83:22�27, 1999.
[16] Kazuo Abe et al. Observation of large CP violation in the neutral B
meson system. Phys. Rev. Lett., 87:091802, 2001.
[17] Roel Aaij et al. Measurement of CP asymmetry in D0 → K−K+ and
D0 → π−π+ decays. JHEP, 07:041, 2014.
[18] Christopher W. Walter. The Super-Kamiokande Experiment. pages
19�43, 2008.
[19] A. Abashian et al. The Belle Detector. Nucl. Instrum. Meth., A479:117�
232, 2002.
[20] Bernard Aubert et al. The BaBar detector. Nucl. Instrum. Meth.,
A479:1�116, 2002.
[21] K. Hayasaka et al. Search for Lepton Flavor Violating Tau Decays into
Three Leptons with 719 Million Produced Tau+Tau- Pairs. Phys. Lett.,
B687:139�143, 2010.
[22] CMS Collaboration. Search for τ → 3µ decays using τ leptons produced
in D and B meson decays. 2019.
120
[23] CMS Collaboration. A Search for Beyond Standard Model Light Bosons
Decaying into Muon Pairs. 2016.
[24] Lyndon Evans and Philip Bryant. LHC Machine. JINST, 3:S08001,
2008.
[25] G. L. Bayatian et al. CMS Physics - Technical Design Report Volume
I:Detector Performance and Software. 2006.
[26] CMS Public Web. Public CMS Luminosity Information.
[27] LHC Commisioning. Peak Luminosity.
[28] The CMS collaboration. Performance of CMS muon reconstruction in
pp collision events at sqrt(s) = 7 TeV. Journal of Instrumentation,
7(10):P10002�P10002, oct 2012.
[29] S. Chatrchyan et al. The CMS Experiment at the CERN LHC. JINST,
3:S08004, 2008.
[30] Bora Akgun. Performance of the CMS Phase 1 Pixel Detector. Technical
Report CMS-CR-2018-012, CERN, Geneva, Jan 2018.
[31] A Dominguez and editor = Abbaneo at al. CMS Technical Design Report
for the Pixel Detector Upgrade. Technical report CERN-LHCC-2012-
016. CMS-TDR-11, CERN, Geneva, Sep 2012.
[32] CMS Collaboration. Technical proposal for the upgrade of the CMS de-
tector through 2020. Technical Report CERN-LHCC-2011-006. LHCC-
P-004, Jun 2011.
[33] The performance of the CMS muon detector in proton-proton colli-
sions at sqrt(s)= 7 TeV at the LHC. Journal of Instrumentation,
8(11):P11002�P11002, nov 2013.
[34] The CMS Collaboration. Performance of the CMS muon detector and
muon reconstruction with proton-proton collisions at sqrt(s)=13 TeV,
Journal of Instrumentation, 13(06):P06015�P06015, jun 2018.
[35] The CMS Collaboration. The CMS trigger system, doi = 10.1088/1748-
0221/12/01/p01020. Journal of Instrumentation, 12(01):P01020�
P01020, jan 2017.
121
[36] CMS Collaboration. Pileup mitigation at CMS in 13 TeV data. 2019.
[37] S Chatrchyan et al. Calibration of the CMS Drift Tube Chambers and
Measurement of the Drift Velocity with Cosmic Rays. JINST, 5:T03016,
2010.
[38] CMS Public Web. Performance of the CMS Muon Detectors in early
2018 collision runs.
[39] R. Fruhwirth. Application of Kalman �ltering to track and vertex �tting.
Nucl. Instrum. Meth., A262:444�450, 1987.
[40] Muon Identi�cation and Isolation e�ciency on full 2016 dataset. Mar
2017.
[41] Muon identi�cation and isolation e�ciencies with 2017 and 2018 data.
Jul 2018.
[42] R. Brun and F. Rademakers. ROOT: An object oriented data analysis
framework. Nucl. Instrum. Meth., A389:81�86, 1997.
[43] Torbjörn Sjöstrand, et al. An Introduction to PYTHIA 8.2. Comput.
Phys. Commun., 191:159�177, 2015.
[44] S. Agostinelli et al. GEANT4: A Simulation toolkit. Nucl. Instrum.
Meth., A506:250�303, 2003.
[45] Anders Ryd, et al. EvtGen: A Monte Carlo Generator for B-Physics.
2005.
[46] Andreas Hocker et al. TMVA - Toolkit for Multivariate Data Analysis.
2007.
[47] Jens Zimmermann. Statistical Learning in High Energy and Astro-
physics. October 2005.
[48] Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshi-
rani. An Introduction to Statistical Learning: With Applications in R.
Springer Publishing Company, Incorporated, 2014.
[49] Aurlien Gron. Hands-On Machine Learning with Scikit-Learn and Ten-
sorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems.
O'Reilly Media, Inc., 1st edition, 2017.
122
Acknowledgments
123