Abstract of “Creating Algorithms for Parsers and Taggers for
W/Z Tagging With UFO Jets · 7/23/2020 · • UFO jets are going to be the new baseline for...
Transcript of W/Z Tagging With UFO Jets · 7/23/2020 · • UFO jets are going to be the new baseline for...
-
Page .Page .
W/Z Tagging With UFO Jets
Xiang Chen([email protected], advisor: Liang Li)
Christof Sauer
Technical supervisor: Chris Malena Delitzsch
CPG meeting
Thursday July 23,2020
1
mailto:[email protected]
-
Page .Page .
Introduction
• UFO jets are going to be the new baseline for large-R jets and new taggers need to be defined to
identify boosted hadronically decaying objects for physics analyses.
• UFO(Unified Flow objects)= PFOs + TCCs
PFlow
TCCs PFlow
TCCs
Particle Flow Objects (PFOs)
Optimized for low Pt
Track-CaloClusters (TCCs)
Optimized for high Pt
Figure source :https://indico.cern.ch/event/750283/contributions/3154797/attachments/1721841/2780124/20180925_MLB_OopsAllTCCs.pdf
2
-
Page .Page .
1. The performance of a variety of substructure variables (in combination with a simple cut
on the jet mass) will be compared to identify the best discriminating substructure variables
for the new jet collection.
2. Develop simple three-variable tagger based on #1.
3. Develop simple mass decorrelated tagger
4. Develop advanced tagger from combinations of several inputs (BDT or DNN)
Plan
3
-
Page .Page .
▪ Signal: W’ ->WZ->qqqq( two large-R jets, one containing the Z boson decay and the other W decay)
▪ Background: dijets
▪ Using several jet collections with different grooming algorithms for study:
AntiKt10UFOCSSKSoftDropBeta100Zcut10Jets (VanillaSD with beta = 1.0 and z_cut = 0.1)
AntiKt10UFOCSSKRecursiveSoftDropBeta100Zcut5NinfJets (RecursiveSD with beta = 1.0, z_cut = 0.05 and N = infinity)
AntiKt10UFOCSSKBottomUpSoftDropBeta100Zcut5Jets (BottomUpSD with beta = 1.0, z_cut = 0.05)
AntiKt10UFOCSSKTrimmedPtFrac5SmallR20Jets (UFO_Trimming )
▪ Signal cuts: W boson
fjet_truthJet_pt/1000. > 200
TMath::Abs(fjet_truth_dRmatched_particle_flavor) == 24
TMath::Abs(fjet_truthJet_eta) < 2
TMath::Abs(fjet_truth_dRmatched_particle_dR) < 0.7
fjet_truthJet_GhostBHadronsFinalCount ==0
fjet_truthJet_m > 50000.
fjet_truthJet_m < 100000
soft drop reference:
https://arxiv.org/abs/1402.2657
https://arxiv.org/pdf/1804.03657.pdf
4
Signal and Background Samples
Signal cuts: Z boson:
fjet_truthJet_pt/1000. > 200
TMath::Abs(fjet_truth_dRmatched_particle_flavor) == 23
TMath::Abs(fjet_truthJet_eta) < 2
TMath::Abs(fjet_truth_dRmatched_particle_dR) < 0.75
fjet_truthJet_GhostBHadronsFinalCount == 0
fjet_truthJet_m > 60000.
fjet_truthJet_m < 110000.
-
Page .Page .
Left figure: comparison of two-variable tagger against
three-variable tagger
Right figure: performance of different jet collection with
three-variable tagger
▪ Three-variable tagger can imporve the performance
▪ BottomUp Soft Drop performs the best in UFO
collection
Three-Variable W Tagger: Preliminary Result
5
-
Page .Page .
Different cut combinations are compared in
same jet collection ( working point = 50%)
• mass+D2+Ntrk500 performs best
• mass+D2+Tau21/KtDR doesn’t have too
much improvements compared with two-
variable taggers
6
Three-Variable W Tagger:Preliminary Result
-
Page .Page .
Z tagging two tagger results @50%
7
W Z
W Z
-
Page .Page .
Z tagging three tagger results @50%
8
Z Z
W
-
Page .Page .
▪ Substructure observables correlated with jet mass. Then, MVA taggers exploit this for
resonance classification.
Mass-decorelated tagger
9
▪ Signal and background become indistinguishable and it
is impossible to perform resonance search
▪ Designed decorrelated taggers(DDT) is used to reduce
the relationship between the observation and mass
BottomUp UFO LCTopo
jet scaling variable
ρDDT= log(𝑚2
𝑃𝑇∗1𝐺𝑒𝑉) is designed
𝑉𝑎𝑟𝐷𝐷𝑇 = 𝑉𝑎𝑟 − 𝑎 ∗ (ρDDT − 𝑐𝑜𝑛𝑠𝑡)
But how about to non-linear one?
-
Page .Page .
▪ Samples are classified by two collections: training and testing
▪ Both contain signals and backgrounds:
training : 1E6 signals 1E6 backgrounds
testing : 8.5E5 signals 1E7 backgrounds(signals maybe not enough)
separated by testing weight
▪ Cuts: fjet_mass: 50-300GeV
Pt: 200-2000GeV
K-NN D2 Tagger
10
• Measure X’th percentile of background
substructure distribution in bins of (ρ, pT)
• 𝐷2𝑘−𝑁𝑁 = 𝐷2 − 𝐷2(7%) is the new
decorrelated variable
-
Page .Page .
K-NN fitting Preliminary Result
11
K-NN fitting
-
Page .Page .
▪ After cuts fjet_m in [50,300] GeV , 𝐷2𝑘−𝑁𝑁’s roc curve is slightly lower than D2 in Pt
range 200-2000 GeV.
▪ But in high Pt range(over 1000GeV), 𝐷2𝑘−𝑁𝑁’s performance is better(in next page)
ROC Curve
12
-
Page .Page . 13
-
Page .Page . 14
-
Page .Page .
▪ Checked performance of both two-variable and three-variable Z/W-tagger
▪ Two variables: FJetMass + D2
▪ Three variables: FJetMass + D2 + NTrk500 is the best
▪ Validation of LCTopo Jets
▪ For both taggers, BottomUp SD is better than LCTopo Jets
▪ Z tagging : BottomUp SD is the best
▪ Simple mass decorrelated tagger
▪ Use k-NN method to make a simple D2-knn tagger
▪ Roc curves show not as good as D2(needs validation)
▪ Combine new mass decorrelated D2 with Ntrk500 and compared to previous cut
Summary and Todo
15
-
Backups
16
-
Page .Page .
Variable Distribution --- D2
17
𝛽 = 0.5
𝛽 = 2.0
𝛽 = 1.2
▪ D2 gives sensible values for systems have zero total momentum and for events are dijet-like.
𝑫𝟐(𝜷)= 𝒆𝟑(𝜷)/(𝒆𝟐(𝜷))𝟑 ,
𝒆𝟑(𝜷)=𝟏
𝑷𝑻𝑱𝟑 𝑷𝑻𝒊𝑷𝑻𝒋𝑷𝑻𝒌 𝑹𝒊𝒋
𝜷𝑹𝒊𝒌𝜷𝑹𝒋𝒌𝜷
𝒆𝟐(𝜷)=𝟏
𝑷𝑻𝑱𝟐 𝑷𝑻𝒊𝑷𝑻𝒋𝑹𝒊𝒋
𝜷
𝑷𝑻𝑱 : transverse momentum of the jet with respect to the beam
𝑷𝑻𝒊: transverse momentum of particle i
𝑹𝒊𝒋𝟐= (φi −φj)2 +(yi −yj)2
-
Page .Page . 18
Variable Distribution --- FJet mass & Ntrk500 & KtDR
▪ FJet mass distribution for W boson
▪ Two-variable tagger: FJet mass and D2 as the baseline
▪ Three-variable tagger: try Ntrk variable first
▪ Ntrk500 is the number of ghost associated tracks
to the jet with pT > 500 MeV
▪ KtDR is obtained from anti-kt R = 1.0 jet,
reclustering its constituents again into two subjets.
-
Page .Page .
Variable Distribution --- Fjet Tau21
▪ Tau_21 use N-subjettiness to effectively “count” the number of
subjets in a given jet.
𝝉𝟏 =𝟏
𝒅𝟎 𝒌𝑷𝑻,𝒌𝒎𝒊𝒏{∆𝑹𝟏,𝒌}
𝝉𝟐 =𝟏
𝒅𝟎
𝒌
𝑷𝑻,𝒌𝒎𝒊𝒏{∆𝑹𝟏,𝒌, ∆𝑹𝟐,𝒌}
𝒅𝟎 = 𝒌𝑷𝑻,𝒌𝑹𝟎
𝝉𝟐𝟏 = 𝝉𝟐/𝝉𝟏 is the discriminating variable
𝝉𝟏 𝝉𝟐 𝝉𝟐𝟏
19
-
Page .Page .
▪ Soft drop declustering recursively removes soft wide-angle radiation from a jet
▪ 1.Break jet J into 2 subjet: j1& j2
▪ 2.cut condition 𝑚𝑖𝑛(𝑝𝑇1,𝑝𝑇2)
𝑝𝑇1+𝑝𝑇2> 𝑧𝑐𝑢𝑡(
∆𝑅12
𝑅0)𝛽 𝑧𝑐𝑢𝑡:soft drop threthold 𝛽:an angular exponent
▪ 3.Otherwise, redefine j to be equal to subjet with larger pT and iterate the procedure
▪ 4.If J cannot decluster -> remove or leave(two mode)
Recursive Soft Drop:
1.After C/A reclustering, taking the remaining branch whose two parent subjets have the widest separation in
∆R, and label these j1 and j2
2.cutcondition as SD
3. two pass ->all remain; otherwise, remove the soft one.
BottomUp SD
Recluster the jet by using SD condition from leaves to branches.
Introduction --Soft Drop
20
-
Page .Page .
▪ k nearest neighbor method
▪ The training examples are vectors in a multidimensional feature space, each with a class label. The training
phase of the algorithm consists only of storing the feature vectors and class labels of the training samples.
In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test point) is
classified by assigning the label which is most frequent among the k training samples nearest to that query
point.
▪ In k-NN regression, the output is the property value for the object. This value is the average of the values
of k nearest neighbors.
K-NN
21