• Nenhum resultado encontrado

4.3 Model independent mass reconstruction

4.3.2 Method performance

4.3.2.1 Solution selection

It is important to test the methods performance using generator level events. The following studies are performed using generator level t¯tH, H → b¯b Pythia events with an equal to zero width. The events where generated considering mtop= 172 GeV, mW = 80 GeV and mHiggs = 125 GeV and have not undergone the detector smearing process. The solution weights are evaluated using the LHAPDF-6.1.2 version and the pdf set CT10. The energy scale is set at the value Q= 235≈[(2∗mtop+mHiggs)/2]. Initially, we test the methods performance allowingmtopandmW to obtain values within the [0,300] GeV and [0,150] GeV limits respectively, with a 5 GeV step. In order to prove the fact that the maximum weight solution enhances the methods performance, we study three different cases with respect to the solution selection:

• Considering all the solutions that the method yields, assuming they have an equal weight.

• Considering all the solutions that the method yields, by weighting each solution with its corresponding PDF weight.

• Considering for each event the maximum weight solution.

Considering all the solutions equally weighted

We can observe that when considering all solutions equally weighted, themtop and mW

Figure 4.4: Reconstructed Higgs mass when considering all solutions that the method yields have an equal weight. We loop independently values of mtop ∈ [0,300] GeV andmW

(a)mtopdistribution of all the solutions that the method yields.

(b)mW distribution of all the solutions that the method yields.

Figure 4.5: Mass distributions after the analytical solution of the tt¯system of equations, without taking under consideration the PDF weight of each solution.

Figure 4.6: mtopvs mW contour of all the solutions that the method yields, without taking under consideration the PDF weight of each solution.

in the two-dimensionalmtopvs mW contour. In Figure 4.4, we observe that the Higgs mass distribution has tails that extend to both low and high mass values and as expected, has a peak atmb¯b = 125 GeV.

Considering all the solutions weighted with their corresponding PDF weight When weighting each solution with its corresponding PDF weight, we can observe (Figures 4.8 and 4.9) that the previously visible shift in both the mtop and the mW distributions

vanishes and as a result the distributions peak around their expected values. Contrary to this improvement, the shape of the Higgs mass distribution, which is shown in Figure 4.7, remains approximately the same. For this reason it is important to find a specific solution selection that will enhance the Higgs mass distribution peak, while at the same time suppress the distributions tails.

Figure 4.7: Reconstructed Higgs mass when weighting each solution with its corresponding PDF weight. We loop independently values ofmtop ∈[0,300] GeV andmW ∈[0,150] GeV with a 5 GeV step.

Considering the maximum weight solution

Instead of using all possible solutions that the method yields, we decide to select only one solution per event, that is characterised by the maximum PDF weight. In Figure 4.10 we can observe that the shape of the Higgs mass distribution is similar, but the peak has been enhanced.

In order to be convinced about the Higgs mass distribution shape differences among the different solution selections, in Figure 4.11 the Higgs mass distributions for the three cases discussed above, are shown superimposed and normalized to area. When using the maximum weight solution selection, the method is able to reconstruct correctly approximately 35% of the events with respect to the mass value of the Higgs boson. Furthermore, both the mtop

and the mW distributions have the expected shapes (Figure 4.12). We observe that both distributions form a ”sharper” peak around a slightly shifted to lower values mass. This systematic shift can be also observed in the two-dimensional contour (Figure 4.13). We can finally conclude that the choice of the maximum weight solution is the most efficient and for the purpose of this thesis, will be used in all following studies.

(a)mtopdistribution of all the solutions that the method yields.

(b)mW distribution of all the solutions that the method yields.

Figure 4.8: Mass distributions of the method, when weighting each solution with its corre- sponding PDF weight. We loop independently values of mtop ∈ [0,300] GeV and mW ∈ [0,150] GeV with a 5 GeV step.

Figure 4.9: mtopvs mW distribution of all the solutions that the method yields.

Due to the fact that our goal is not to rediscover the top quark and W boson masses, but to reconstruct the Higgs boson mass, we have the freedom to constrain their values with respect to their reconstruction resolution. When taking under consideration that the reconstructed mass values can deviate ∼ 20% from their corresponding measured values (mtop ≈ 172 GeV, mW ≈ 80 GeV), we select the following limits fot the mtop and mW

values:

Figure 4.10: Reconstructed Higgs mass when selecting the maximum weight solution per event. We loop independently values ofmtop ∈[0,300] GeV andmW ∈[0,150] GeV with a 5 GeV step.

Figure 4.11: Reconstructed Higgs mass in the three cases discussed, superimposed and normalized to area.

• mtop∈[150,200] GeV with a 5 GeV step.

• mW ∈[60,100] GeV with a 5 GeV step.

(a)mtopdistribution of the maximum weight solution per event that the method yields.

(b)mW distribution of the maximum weight solution per event that the method yields.

Figure 4.12: Mass distributions of the method, when selecting only the maximum weight solution per event. We loop independently values ofmtop ∈[0,300] GeV andmW ∈[0,150]

GeV with a 5 GeV step.

Figure 4.13: mtopvs mW distribution of the maximum weight solution per event.

As we can see in Figures 5.4 and 5.6, the shapes of the Higgs mass distribution when using a constrained and an unconstrained mass loop formtopandmW are similar with respect to the low and high mass tails. On the contrary, when considering a more constrained loop, we observe that the number of events that reside in the peak increases. In this case, the method is able to reconstruct correctly approximately 43% of the events with respect to the mass value of the Higgs boson. As a result, the choice of the limits mtop ∈[150,200] GeV andmW ∈[60,100] GeV increases the efficiency with respect to the peak in the Higgs mass distribution, and for the purpose of this thesis, will be used in all following studies.

Figure 4.14: Reconstructed Higgs mass when selecting the maximum weight solution per event. We loop independently values of mtop ∈ [150,200] GeV andmW ∈ [60,100] GeV with a 5 GeV step.

Figure 4.15: Reconstructed Higgs mass for two differentmtopandmW loops, superimposed and normalized to the corresponding number of events.

4.3.2.3 Parton interactions and PDFs

Finally, it is essential to study the effect of two additional parameters on the Higgs mass distribution. The first parameter is the type of parton interaction considered in the pp

In the previous studies we considered only gluon-gluon parton interactions when calculat- ing the PDF weight of each solution and consequently the maximum weight. In order to check the validity of our assumption we compare the Higgs boson mass distribution of the maximum weight solution per event in two different cases:

• Choosing the maximum weight solution per event consideringgg interactions only.

• Choosing the maximum weight solution per event considering all possible parton in- teractions (gg,u¯u, ¯uu, dd, ¯¯ dd).

Figure 4.16: Higgs boson mass distribution when considering only gluon-gluon interactions (black) and when considering all possible parton interactions (magenta), superimposed and normalized to area.

In Figure 5.7 we can observe that the distributions are identical and the dominant parton interaction is thegginteraction. As a result, without loss of generality, the following studies we will performed consideringgg interactions only.

Finally, we also choose a different PDF set (NNPDF21 lo as 0119 100) and repeat all the steps of the analytical solution metthod, in order to determine whether our specific choice of PDF set has an effect on the shape of the Higgs mass distribution. As we can see in Figure 5.9, the shapes of the distributions are identical. Consequently, for the purpose of this thesis, the following studies we will performed using the PDF set CT101.

1PDF sets can be found at the linkhttp://www.hepforge.org/archive/lhapdf/pdfsets/6.1/

Figure 4.17: Higgs boson mass distribution for two different PDF sets, superimposed and normalized to area.

5 Search for the ttH, H → b ¯ b using CMS Reconstruction level events

In Section 4.3.2 we studied the Model independent mass reconstruction methods perfor- mance in the t¯tH, H →b¯bdilepton channel using equivalent generator level events. In the following section we will repeat the steps leading to the reconstruction of the Higgs boson mass, this time using MC reconstruction level events that resemble the signal and the back- grounds of this particular process. A pre-selection will be applied over all the MC samples in order to reduce the background, while minimizing the number of signal events that are being rejected. Our main goal is to estimate the significance of theH →b¯bsignal over the expected background for a luminosity of 200f b−1.

5.1 CMS Simulation Samples 5.1.1 Signal Samples

For the study of the t¯tH, H → b¯b dilepton channel, the signal samples where modelled using the POWHEG Box VERSION 2 Monte Carlo (MC) matrix element generator plus PYTHIA8 as general purpose MC event generator. The samples used in this analysis along with their corresponding cross sections and branching ratios are listed in Table 5.1.

Signal Samples

Higgs Decay Dataset σ×B[pb]

ttH, H¯ b¯b

/ttHTobb M125 TuneCUETP8M2 ttHtranche3 13TeV-

powhegpythia8/RunIISpring16MiniAODv2-premix withHLT 80X mcRun2 asymptotic

v14v1/MINIAODSIM

0.5824×0.5071

t¯tH, H non b¯b

/ttHToNonbb M125 TuneCUETP8M2 ttHtranche3 13TeV-powhegpythia8/RunIISpring16MiniAODv2-

premix withHLT 80X mcRun2 asymptotic v14v1/MINIAODSIM

0.4176×0.5071

Table 5.1: List of MC signal samples for thet¯tH dilepton channel with their corresponding cross sections and branching ratios.

5.1.2 Background Samples

The backgrounds samples where modelled using MC event samples from the RunIISpring16 MC campaign. Most of the samples are generated in next-to-leading order (NLO) of per- turbation theory either with the M G5AM C(NLO) matrix element generator matched to the general-purpose event generator PYTHIA8, or with the NLO matrix element generator POWHEG Box VERSION 2 MC combined with PYTHIA8. These samples are recon- structed using the CMSSW version 80X. The background samples used in this analysis along with their corresponding cross sections are listed in Table 5.2.

Background Samples

Process Dataset Cross section

[pb]

t¯t+jets, all decays

/TT TuneCUETP8M1 13TeV-powheg-pythia8/

RunIISpring16MiniAODv2-

PUSpring16RAWAODSIM reHLT 80X mcRun2 asymptotic 832 v14 ext3-v1

831.8

Z/γ?+jets, 10GeV < M <50GeV

/DYJetsToLL M-10to50 TuneCUETP8M1 13TeV- amcatnloFXFX-pythia8/RunIIFall15MiniAODv2- PU25nsData2015v1 76X mcRun2 asymptotic v12-v1

22635.09

Z/γ?+jets, M >50GeV

/DYJetsToLL M-50 TuneCUETP8M1 13TeV- amcatnloFXFX-pythia8/RunIISpring16MiniAODv2-

PUSpring16RAWAODSIM reHLT 80X mcRun2 asymptotic v14-v1

6025.2

Single t

/ST tW top 5f inclusiveDecays 13TeV-powheg-pythia8

TuneCUETP8M1/RunIISpring16MiniAODv2- PUSpring16 80X mcRun2 asymptotic 2016

miniAODv2 v0-v2

35.6

W+jets, Wlv

/WJetsToLNu HT-XToY TuneCUETP8M1 13TeV- madgraphMLM-pythia8/RunIISpring16MiniAODv2-

PUSpring16 80X mcRun2 asymptotic 2016 miniAODv2 v0 ext1-v1

0.21

W W

/WW TuneCUETP8M1 13TeV-

pythia8/RunIISpring16MiniAODv2-PUSpring16 80X mcRun2 asymptotic 2016 miniAODv2 v0-v1

118.7

W Z

/WZ TuneCUETP8M1 13TeV-

pythia8/RunIISpring16MiniAODv2-PUSpring16 80X mcRun2 asymptotic 2016 miniAODv2 v0-v1

47.13

ZZ

/ZZ TuneCUETP8M1 13TeV-

pythia8/RunIISpring16MiniAODv2-PUSpring16 80X mcRun2 asymptotic 2016 miniAODv2 v0-v1

16.523

t¯t+W, Wlv

/TTWJetsToLNu TuneCUETP8M1 13TeV-amcatnloFXFX-madspin- pythia8/RunIISpring16MiniAODv2- premix

withHLT 80X mcRun2 asymptotic v14-v1

0.21

t¯t+W, Wqq

/TTWJetsToQQ TuneCUETP8M1 13TeV- amcatnloFXFX-madspin- pythia8/RunIISpring16MiniAODv2 -

PUSpring16 80X mcRun2 asymptotic 2016v miniAODv2 v0-v1

0.21

tt¯+Z, Zqq

/TTZToQQ TuneCUETP8M1 13TeV-amcatnlo- pythia8/RunIISpring16MiniAODv2-PUSpring16 80X

mcRun2 asymptotic 2016 miniAODv2 v0-v1

0.611

Table 5.2: List of MC background samples with their corresponding cross sections.

5.2 Pre-selection Criteria and event yields

As mentioned in the beginning of this chapter, it is essential to apply pre-selection criteria to the simulation samples used in the analysis. The purpose of the pre-selection is to ensure that the quality of the objects used in the analysis (jets,leptons etc) is optimal, and depending on the process we are studying, reduce the corresponding background, while minimizing the number of signal events that are being rejected. To obtain good quality Particle Flow (PF) jets, a minimum p of 30 GeV/c is required. The lepton p cut is looser, requiring

jets. The rest pre-selection criteria ensure that the final state objects correspond to those of thet¯tH, H→b¯bdileptonic decay, which consists of four b-jets, two leptons and missing energy due to the presence of neutrinos. In order to identify jets that originate from b-quark decays, the CSVv2 b-tagging discriminator is used. There are three selection criteria which are often called working or operating points (WP), the loose (CSVv2>0.460), the medium (CSVv2>0.800) and the tight (CSVv2>0.935) WP. For the purpose of this analysis, the medium WP is used. The event pre-selection criteria are analytically listed below.

• #P F jets >3 and #leptons >1

For the four leadingpT jets and two leadingpT leptons :

• #b−tagged P F jets= 4

• b-tag at medium WP (>0.8)

• pT >30 GeV/c all PF jets

• pT >25 GeV/c leading lepton

• pT >15 GeV/c subleading lepton

• |η| <2.4 for leptons and≥2 jets

• M ET >40 GeV (in same flavour channels)

• mll>20 GeV (in same flavour channels)

• Z veto (76 GeV< mll <106 GeV ) (in same flavor channels)

5.2.1 Sanity check

In order to cross-check our code, ntuple production and resulting event yields we compare the resulting event yields against the HIG-16-038 analysis using the same pre-selection, listed in Figure 5.14 [34]. The events yields of the two analyses, as well as their ratio can be observed in Table 5.3.

Figure 5.1: Object and event selection criteria used by the HIG-16-038 analysis for the t¯tH, H→b¯bdileptonic decay channel.

Process Event yields HIG-16-038 analysis

Event yields Ratio

t¯t+jets,

all decays 1438 1676 0.86

ttH,¯

all decays 59 68 0.87

Table 5.3: Event yield comparison for the signal and main background using the same pre- selection scaled to 200f b−1luminosity. The second column corresponds to the event yields calculated for this thesis when reproducing the HIG-16-038 analysis event selection, while the event yields of the third column, are taken from the HIG-16-038 analysis and scaled to 200f b−1. The third column corresponds to the ratio of the second over the third column.

According to the last column of Table 5.3, we where able to reproduce the event yields of the HIG-16-038 analysis within a∼15% agreement. The∼15% difference appears due to us not yet applying the recommended DATA/MC scale factors, trigger and b-tagging efficiencies etc. After this brief sanity check, that ensured the validity of our code, ntuple production etc, we will proceed with extrapolating the event yields for a luminosity of 200f b−1 using the optimized pre-selection discussed in section 5.2.

5.2.2 Event yields

After applying the event pre-selection described in section 5.2, the resulting event yields for Lumi= 200f b−1 can be observed in Table 5.4 with respect to the signal samples and in Table 5.5 with respect to the background samples. In order to calculate the event yield of each process, we multiply the absolute number of events that pass the pre-selection with a corresponding weight that is defined as:

weight= Lumi×σprocess

NM C

(5.1) where Lumiis the luminosity we choose to scale the events,σprocess is the cross section of the process andNM C corresponds to the number of events produced by theM C generator prior to applying any event cuts.

Signal Samples

Process Event yield

t¯tH, H→b¯b 50

t¯tH, H → non b¯b 2

Total Signal 52

Table 5.4: Event yield of signal samples using the pre-selection described in section 5.2. The

Background Samples

Process Event yield

t¯t+jets, all decays 1155 Z/γ?+jets, 10GeV < M <50GeV 0

Z/γ?+jets, M >50GeV 0

Single t 14

W +jets, W →lv 0

W W 0

W Z 0

ZZ 3

tt¯+W, W →lv 2

tt¯+W, W →qq 2

t¯t+Z, Z →qq 15

Total Background 1191

Table 5.5: Event yield of background samples using the pre-selection described in section 5.2. The event yields are scaled to 200f b−1 luminosity.

According to the resulting yields forLumi= 200f b−1, we expect 50 signal events compared to approximately 1200 background events. As expected, the most dominant background is thet¯t+jets process, which constitutes∼ 97% of the total background.

5.2.3 Higgs mass distributions

The next step after calculating the event yields is by using the model independent mass re- construction method, to reconstruct the mass of the two b jets, assigned to the Higgs boson for both the signal and background samples. Regarding the signal samples, we expect the mb¯b distribution to be similar with that produced using generator level events, but with a smeared peak around 125 GeV instead of a delta function. On the other hand, regarding the background samples, we do not expect the mb¯b distribution to form a peak around 125 GeV.

For each event that fulfils the event selection criteria, the four leading pT jet four-vectors, two leading pT lepton four-vectors as well as the missing energy in the x and y axes (M ETx, M ETy) are considered as inputs for the method. In Figure 5.2 we can observe the Higgs mass distribution for the two signal samples, weighted with their corresponding weights. As expected, the distribution of the t¯tH, H → b¯b process forms a peak around 125 GeV and is significantly larger that the distribution of the t¯tH, H → non b¯bprocess.

In Figure 5.4 we can observe the different mb¯b distributions of the background samples.

The most significant contributions comes from thett¯+jetsprocess, which as expected has a different shape from the signal. Consequently, the following studies will be performed considering as background only thett¯+jetsprocess contribution.

Figure 5.2: Higgs mass distribution of thettH, H→b¯b(blue) andttH, H→non b¯b(green) processes, weighted with their corresponding weights.

Figure 5.3: mb¯b distribution of the the processes that contribute as backgrounds weighted with their corresponding weights. We again observe that t¯t+jets background (black) is dominant.

5.3 Optimization studies

In the following section the choice of the 4-btag requirement in the event pre-selection as well as an efficient way to measure the background contribution will be discussed.

5.3.1 b-tag multiplicity optimization

In section 5.2 where the event selection was described, an important requirement was the fact that all four leadingpT jets should be b-tagged. Jets originating from b-quarks, unlike jets originating from light quarks, are characterised by a relatively large mass (∼4.8 GeV) and a long lifetime (τb∼1.6 ps), which corresponds to a flight distance in the detector that can be experimentally measured and identified as a secondary vertex (SV). The CSV algorithm which is used identify b jets, combines information about the SV and the jet lifetime using a multivariate algorithm. Finally, the algorithm provides for each jet a probability between 0 and 1 to be originating from a b-quark.

It is interesting to study the effect the number of jets we request to be b-tagged has, on the shape of the signal and background mb¯b distributions. We study three different cases, of exactly two, three and four b-tags among the four leadingpT jets.

(a) Signal shapes of mb¯b distribution when requesting exactly two (blue), three (ma- genta) of four (black) jets to be btagged.

(b) Background shapes of mb¯b distribution when requesting exactly two (blue), three (magenta) of four (black) jets to be btagged.

Figure 5.4: Signal and background shapes of mb¯b distribution when requesting different numbers of b-tagged jets.

We observe that as b-jet multiplicity increases, the Higgs mass reconstruction regarding the signal sample improves, whereas themb¯b distribution regarding the background sample remains roughly the same. This fact can be also confirmed in Figure 5.6 where the signal and background shapes are compared in each case. When requesting all four leading pT jets to be b-tagged, the signal and background distributions present the maximum shape separation.

(a) Signal and backgroundmb¯b distribution where #btagged jets= 2

(b) Signal and backgroundmb¯bdistribution where #btagged jets= 3

(c) Signal and backgroundmb¯b distribution where #btagged jets= 4

Figure 5.5: Signal and backgroundmb¯bdistributions superimposed and normalized to area, when requesting exactly two (a), three (b) of four (c) jets to be b-tagged.

5.3.2 Data-driven background estimation

In Figure 5.4(b) we observed that the shape of themb¯b distribution regarding thet¯t+jets process does not change when the number of requested b-tags is two, thee or four. Initially, this allows us to use the shape of themb¯b distribution where #btagged jets=2, scaling it to the number of events that the condition #btagged jets=4 yields. This will lead to a smoother background shape, with smaller statistical fluctuations, resulting to a more accurate signal over background significance estimation.

Apart from increasing the statistics of the MC background sample, this observation can give

Documentos relacionados