Algoritmos de
                          clusterización para
                          partículas de muy bajo
                          pT en ATLAS
                          Carmen Iglesias
                          Dpto. Física Atómica y Nuclear
                          IFIC-Universidad de Valencia
XXX Reunión Bienal de la Real Sociedad Española de Física
Campus Universitario de Ourense de la Univ de Vigo, 12-16 Septiembre de 2005.
Why Clustering is useful for Energy Flow?
Samples used
    DC1 samples of pions and neutrons (the main components of jets) at very low ET (pT
     =1-30 GeV), because this is the range of ET better to apply Energy Flow Algorithm.
    Used to generate ntuples with 1000 events at η=0.3 (central barrel) and φ=1.6 of :
        π’0s, to understand the behavior of photons inside the EM calorimeter.
       π’+s and neutrons, to know more about the hadronic shower.

      First, without electronic noise applied and later with it.




                               Shower composition
    The shower of the π0 has only e.m. components!!!



                                                                             neutron
       electrons     photons
                                 π0                    e-and q γ
                                                                   π0
    positrons                                 π-                                  proton
                                                                    π+
Total energy deposited
   For the π0’s, as there are only e.m. particles we expect having all the ET deposited
    in the E.M calorimeter
   For π+’s and neutrons the situation is different. Although, for high pT particles
    their ET is usually deposited only in the HAD calorimeter, at very low energy, they
    also deposited their energy in the EM calorimeter (~40-50%) and this deposition
    increase with the ET of the particles
INDEX
1)Compare Clustering Algorithms in ATLAS
3)Lower threshold for Seed and Neighbor cells
4)Cone algorithms
5)Topocluster analysis with Electronic Noise
Clustering Algorithms in ATLAS
   Sliding Window (SW) Clustering
      Simple search for local maxima of ET deposit on a grid using a fixed-size “window” made up
         of a group of contiguous cells in η-φ space. Local maxima are found by moving the
         windows by fixed setps in η and φ.
      Default value is 5 x 5 cells in each cluster. Another values for SW clusters: 3x5 cells (for
         unconverted photons) and 3x7 cells (for electrons and converted photons).

   EGAMMA Clusters
      Combines Inner detector tracks information with calorimeter clusters (SW) using the default
       value of 5 x 5 cells in each cluster
      Useful for the identification of the e.m objects (photons and electrons).

   TopoCluster Algorithm
    For the reconstruction of hadronic shower, the energy                                 Seed Cell
    depositions near by cells have to be merged to clusters                  phi
      Cluster is built around a Seed Cell which has an ET
      above a certain threshold (Seedcut). The neighbours of
      the Seed Cell are scanned for their ET and are added to
      the cluster if this ET is above the neighborcut. Then the
      neighbors of the neighbors are scanned and so on.
      The cuts, which are made for the seed and the                                    eta
      neighbour, depend on the noise in each cell           Neighbour Cell
Clustering comparison
   First, calculate the ET deposited in all CELLs of the calorimeter and consider it as
    the “reference Energy Flow ”, i.e., the best resolution that could be reach for the most
    sophisticated algorithm taking into account the whole ET in all the calorimeter.
      For π0’s, compare the resolution of “reference Energy Flow” with the resolution of:
           Sliding Window Cluster/EGAMMA cluster
           TOPOcluster in EM calorim
      For π+’s and neutrons, compare the resolution of “reference Energy Flow” with :
           TOPOcluster in EM and Tile
           PT of TRACKS from XKalman


   Compare different ways of reconstructing TopoCluster at VLE particles, to find
    the best ET resolution
    the larger amount of ET deposited inside the cluster.
        Use these thresholds:




     And checking different thresholds for EM Noise:
          EM Noise=10 MeV (lower than realistic case, only useful for checking VLE particles)
          EM Noise=70 MeV (Fix Value by default for EM cal)
          CaloNoiseTool=true (package with a model for the electronic noise)
π+’s resolution
•Resolution from PT of TRACKS
is the best result, but it get worse
as the ET of particle increases.
•The best resolution for ET
comes from the ET deposited in
all calorimeter cells
•Around 30 GeV, ET resolution
get better than PT resolution 
limit of Energy Flow algo

   neutrons resolution
The worst result is at 1 GeV:
•ET very similar to the mass of
neutron~940MeV.




  For the TOPOclusters CaloNoiseTool is the most realistic simulation of Electronic Noise.
 The rest of the analysis will be done using it.
π0’s resolution

π0’s have better resolution
 than π+’s and neutrons
For Sliding-Window clusters,
always are obtained the same
results as EGamma.
TopoCluster non defined->low multiplicity

• At 1, 3 and 5 GeV TopoCluster results have non-sense-> Energy resolution increase
instead of decreasing with ET. There is a loss in the deposited energy due to the low
multiplicity of these clusters
INDEX
1)Compare Clustering Algorithms in ATLAS
2)Lower threshold for Seed & Neighbor cells
3)Cone algorithms
4)Topocluster analysis with Electronic Noise
2)Lower threshold for Seed and Neighbor cells
    Lost of ET deposited in TOPOcluster due to the low multiplicity of these clusters
    It’s needed to move for lower cuts for the generation of TOPO.
        Seed_cut: E/σ= 30  6, 5, 4…
        Neigh_cut: E/σ= 3  3, 2.5, 2…




     For π+’s and neutrons, the best
     resolution for TOPOcluster using
     CaloNoiseTool comes from
     Seed_cut=4 and Neigh_cut=2.

     The behaviour of TOPOcluster
     resolution is more similar to the
     resolution of the ET deposited by all
     cells in the calorimeter
The resolution of TOPOclusters using
CaloNoiseTool and Seed_cut=6, 5 o 4
is even better than the resolution of EGamma.




Using these new thresholds the low efficiency of TopoClusters for these single particles
at 1-5GeV has been practically eliminated, mainly in π0’s case. The worst results is for
neutrons at 1 GeV, but it also improves with the changed cuts.
Deposited Energy

 For π+’s and neutrons,
 changing the Seedcut from
 30 to 4, a large increase in
 the deposited energy is
 obtained, mainly at 1-5 GeV
 (the ET is almost the double)



For π0’s, with the new cuts, the
Values of deposited ET for Topo
are very similar to the Egamma
one and competitive respect to
the total energy in all the cells.
INDEX
1)Compare Clustering Algorithms in ATLAS
2)Lower threshold for Seed & Neighbor cells
3)Cone Algorithms
4)Topocluster analysis with Electronic Noise
4)Cone algorithms
    Next, study the ET inside a cone with a radius ∆R=√∆η2+∆φ2
       Different strategies are followed for the different type of particle

              Neutral pions
                  •Cone’s centred in η-φ coord of EGAMMA cluster
                  •Cone’s centred in η-φ coord of TOPO cluster in EM cal
                  •Cone’s centred in η-φ coord of TRUTH generated π0
              Charged pions
                  •Cone’s centred in η-φ of TRUTH generated π±
                  •Cone’s centred in η-φ of TRACK position at 2nd layer
              Neutrons
                  •Cone’s centred in η-φ of TRUTH generated neutrons
         In principle, it’s used a cone with ∆R<1.0  in this first contact, only it’s
          required to select the cone algorithm with the best resolution.
               For π0’s and neutrons:
                    Cone’s centered in η-φ coord of TRUTH
                    For π±’s:
                    Cone’s centered in η-φ of TRACK position at 2nd layer

    But with ∆R<1.0 I’m taking into account more than one shower in the same cluster.
    It’s needed to defined ∆R for each type of particle
Defined ∆R of the cone algorithm
   For π0’s:
    From “Calorimeter Performance” analysis the cluster size are (for E<100GeV):
     Unconverted photons: 5x3 cells  ∆φ= 0.0625 ∆η=0.0375 (∆R<0.073)
     Converted photons and electrons : 7x3cells  ∆φ= 0.0875 ∆η=0.0375 (∆R<0.095)
    For the reconstruction of the clusters from π0’s, will be used:
     ∆R <0.1 for starting, because I’m using very low ET
     ∆φ= 0.0875 ∆η=0.0375 : 7x3cells
     ∆φ= 0.0625 ∆η=0.0375 : 5x3 cells
     ∆R<0.0375: 3x3 cells

   For π±’s:
    From LAr TestBeam analysis, the cluster size for pions:
     7x7 cells (∆R<0.12),
     9x7 cells (∆R<0.16),
     11x11 cells (∆R<0.20)…
    For the reconstruction of the clusters from π±’s:
     ∆R <0.4
     ∆R<0.2
     ∆R <0.1

   For neutrons: the shower of the neutrons must be so wide as the π±'s.
    So, in principle:
     ∆R>0.1, ∆R<0.2 and ∆R<0.4
ET Resolution with Cone algorithms
                     Always the best resolution is for ∆R<1.0,
                     but it includes more than the shower of
                     one particle.


                    For π±’s the best resolution for TRACK-cone
                    with ∆R<0.4, but with ∆R<0.2. I have also a good
                    resolution and it let me a better definition of the
                    shower of one π±.
                    For neutrons: the best resolution with ∆R<0.4,
                    but ∆R<0.2 is still very good resolution.
                    In both cases, ∆R<0.1 is too strict to defined
                    hadronic particles.




                    For π0’s: Resolution with ∆R<0.1 is the better.
                    Clusters with 7x3 and 5x3 cells gives us good
                    resolution but not so good.3x3 is too strict. They
                    could be useful when elect noise will be applied
Clustering Algorithms Comparison
  The best algorithm for the reconstruction of
  the clusters from single particles at very low
  ET (without electronic noise) is, in each case:
   For π±’s: Track-cone with ∆R<0.2 (Truth-
      cone is close but with ∆R<0.4)
   For neutrons: Truth-cone with ∆R<0.2 in
      general, but TOPO with Seed_cut=4 and
      Neigh_cut=2 is very near and it’s better at
      1and 3 GeV.
   For π0’s: Truth-cone with ∆R<0.1.
          EGAMMA-cluster give worse resolution, in
           general, than TOPO and Truth-cone, but
           gives the best resolution of all at 1 GeV.

  Anyway, the results from TOPO algorithm
  with Seed_cut=4 and Neigh_cut=2 are very
  competitive for neutrons and π0’s,
  for π±’s TOPO is a good algo but not enough,
  for the time being (it will be needed to test
  new versions of TopoCluster package in the
  newer release of Athena 8.2.0)
INDEX
1)Compare Clustering Algorithms in ATLAS
2)Lower threshold for Seed & Neighbor cells
3)Cone Algorithms
4)Topocluster analysis with Electronic Noise
Topocluster analysis with Electronic Noise
 The energy deposited inside TopoCluster comes from the generated particles, but also
 from the electronic noise
                            π±’s                     neu                       π0’s




Asking for a minimum value of ET in Seed Cell and Neighbor cells:
     Seed Cell >200MeV
     Neighbor cells >80MeV
a similar value of             without noise is obtained.
                                                              After these cuts, the size
                                                              ot the Topocluster is up to
                                                              14 times smaller.
                                                              This difference is more
                                                              important for the EM calo
                                                              because there the level of
                                                              noise with respect to the
                                                              signal is bigger.
π±’s                      neu                      π0’s




The ET resolution get worse with the application of these cuts there is a loss in
energy reconstruction of the clusters. WHY?
Because we have applied a general threshold to the ETcell for all calorimeter, and the
electronic noise contribution is different in each layer of LAr and Tile.



  Seed Cell >200MeV
  Neighbor cells >80MeV
Conclusions
   WITHOUT NOISE:
     The best E resolution for VLE particles is obtained with cone
       algorithms
     TopoCluster is a very competitive algorithm but doing the changes:
            Using CaloNoiseTool to model th eEM Noise
            Applying lower thresholds to Seed and Neighbor cells:
                  SeedCut=4 and NeighborCut =2
        TopoClusters is event better than EGamma cluster for π0’s.

   WITH NOISE:
     The E resolution get worse for TopoCluster
     If we try to remove electronic noise, we also get a loss in energy from
       particles
            It will be needed to applied ET thresholds in each layer of LAr and Tile

Presentacion bienal 2005 "Algoritmos Clusterizacion"

  • 1.
    Algoritmos de clusterización para partículas de muy bajo pT en ATLAS Carmen Iglesias Dpto. Física Atómica y Nuclear IFIC-Universidad de Valencia XXX Reunión Bienal de la Real Sociedad Española de Física Campus Universitario de Ourense de la Univ de Vigo, 12-16 Septiembre de 2005.
  • 2.
    Why Clustering isuseful for Energy Flow?
  • 3.
    Samples used  DC1 samples of pions and neutrons (the main components of jets) at very low ET (pT =1-30 GeV), because this is the range of ET better to apply Energy Flow Algorithm.  Used to generate ntuples with 1000 events at η=0.3 (central barrel) and φ=1.6 of :  π’0s, to understand the behavior of photons inside the EM calorimeter.  π’+s and neutrons, to know more about the hadronic shower. First, without electronic noise applied and later with it. Shower composition The shower of the π0 has only e.m. components!!! neutron electrons photons π0 e-and q γ π0 positrons π- proton π+
  • 4.
    Total energy deposited  For the π0’s, as there are only e.m. particles we expect having all the ET deposited in the E.M calorimeter  For π+’s and neutrons the situation is different. Although, for high pT particles their ET is usually deposited only in the HAD calorimeter, at very low energy, they also deposited their energy in the EM calorimeter (~40-50%) and this deposition increase with the ET of the particles
  • 5.
    INDEX 1)Compare Clustering Algorithmsin ATLAS 3)Lower threshold for Seed and Neighbor cells 4)Cone algorithms 5)Topocluster analysis with Electronic Noise
  • 6.
    Clustering Algorithms inATLAS  Sliding Window (SW) Clustering  Simple search for local maxima of ET deposit on a grid using a fixed-size “window” made up of a group of contiguous cells in η-φ space. Local maxima are found by moving the windows by fixed setps in η and φ.  Default value is 5 x 5 cells in each cluster. Another values for SW clusters: 3x5 cells (for unconverted photons) and 3x7 cells (for electrons and converted photons).  EGAMMA Clusters  Combines Inner detector tracks information with calorimeter clusters (SW) using the default value of 5 x 5 cells in each cluster  Useful for the identification of the e.m objects (photons and electrons).  TopoCluster Algorithm For the reconstruction of hadronic shower, the energy Seed Cell depositions near by cells have to be merged to clusters phi Cluster is built around a Seed Cell which has an ET above a certain threshold (Seedcut). The neighbours of the Seed Cell are scanned for their ET and are added to the cluster if this ET is above the neighborcut. Then the neighbors of the neighbors are scanned and so on. The cuts, which are made for the seed and the eta neighbour, depend on the noise in each cell Neighbour Cell
  • 7.
    Clustering comparison  First, calculate the ET deposited in all CELLs of the calorimeter and consider it as the “reference Energy Flow ”, i.e., the best resolution that could be reach for the most sophisticated algorithm taking into account the whole ET in all the calorimeter.  For π0’s, compare the resolution of “reference Energy Flow” with the resolution of:  Sliding Window Cluster/EGAMMA cluster  TOPOcluster in EM calorim  For π+’s and neutrons, compare the resolution of “reference Energy Flow” with :  TOPOcluster in EM and Tile  PT of TRACKS from XKalman  Compare different ways of reconstructing TopoCluster at VLE particles, to find the best ET resolution the larger amount of ET deposited inside the cluster.  Use these thresholds: And checking different thresholds for EM Noise:  EM Noise=10 MeV (lower than realistic case, only useful for checking VLE particles)  EM Noise=70 MeV (Fix Value by default for EM cal)  CaloNoiseTool=true (package with a model for the electronic noise)
  • 8.
    π+’s resolution •Resolution fromPT of TRACKS is the best result, but it get worse as the ET of particle increases. •The best resolution for ET comes from the ET deposited in all calorimeter cells •Around 30 GeV, ET resolution get better than PT resolution  limit of Energy Flow algo neutrons resolution The worst result is at 1 GeV: •ET very similar to the mass of neutron~940MeV.  For the TOPOclusters CaloNoiseTool is the most realistic simulation of Electronic Noise. The rest of the analysis will be done using it.
  • 9.
    π0’s resolution π0’s havebetter resolution than π+’s and neutrons For Sliding-Window clusters, always are obtained the same results as EGamma. TopoCluster non defined->low multiplicity • At 1, 3 and 5 GeV TopoCluster results have non-sense-> Energy resolution increase instead of decreasing with ET. There is a loss in the deposited energy due to the low multiplicity of these clusters
  • 10.
    INDEX 1)Compare Clustering Algorithmsin ATLAS 2)Lower threshold for Seed & Neighbor cells 3)Cone algorithms 4)Topocluster analysis with Electronic Noise
  • 11.
    2)Lower threshold forSeed and Neighbor cells  Lost of ET deposited in TOPOcluster due to the low multiplicity of these clusters It’s needed to move for lower cuts for the generation of TOPO.  Seed_cut: E/σ= 30  6, 5, 4…  Neigh_cut: E/σ= 3  3, 2.5, 2… For π+’s and neutrons, the best resolution for TOPOcluster using CaloNoiseTool comes from Seed_cut=4 and Neigh_cut=2. The behaviour of TOPOcluster resolution is more similar to the resolution of the ET deposited by all cells in the calorimeter
  • 12.
    The resolution ofTOPOclusters using CaloNoiseTool and Seed_cut=6, 5 o 4 is even better than the resolution of EGamma. Using these new thresholds the low efficiency of TopoClusters for these single particles at 1-5GeV has been practically eliminated, mainly in π0’s case. The worst results is for neutrons at 1 GeV, but it also improves with the changed cuts.
  • 13.
    Deposited Energy Forπ+’s and neutrons, changing the Seedcut from 30 to 4, a large increase in the deposited energy is obtained, mainly at 1-5 GeV (the ET is almost the double) For π0’s, with the new cuts, the Values of deposited ET for Topo are very similar to the Egamma one and competitive respect to the total energy in all the cells.
  • 14.
    INDEX 1)Compare Clustering Algorithmsin ATLAS 2)Lower threshold for Seed & Neighbor cells 3)Cone Algorithms 4)Topocluster analysis with Electronic Noise
  • 15.
    4)Cone algorithms  Next, study the ET inside a cone with a radius ∆R=√∆η2+∆φ2  Different strategies are followed for the different type of particle Neutral pions •Cone’s centred in η-φ coord of EGAMMA cluster •Cone’s centred in η-φ coord of TOPO cluster in EM cal •Cone’s centred in η-φ coord of TRUTH generated π0 Charged pions •Cone’s centred in η-φ of TRUTH generated π± •Cone’s centred in η-φ of TRACK position at 2nd layer Neutrons •Cone’s centred in η-φ of TRUTH generated neutrons  In principle, it’s used a cone with ∆R<1.0  in this first contact, only it’s required to select the cone algorithm with the best resolution. For π0’s and neutrons: Cone’s centered in η-φ coord of TRUTH For π±’s: Cone’s centered in η-φ of TRACK position at 2nd layer But with ∆R<1.0 I’m taking into account more than one shower in the same cluster. It’s needed to defined ∆R for each type of particle
  • 16.
    Defined ∆R ofthe cone algorithm  For π0’s: From “Calorimeter Performance” analysis the cluster size are (for E<100GeV):  Unconverted photons: 5x3 cells  ∆φ= 0.0625 ∆η=0.0375 (∆R<0.073)  Converted photons and electrons : 7x3cells  ∆φ= 0.0875 ∆η=0.0375 (∆R<0.095) For the reconstruction of the clusters from π0’s, will be used:  ∆R <0.1 for starting, because I’m using very low ET  ∆φ= 0.0875 ∆η=0.0375 : 7x3cells  ∆φ= 0.0625 ∆η=0.0375 : 5x3 cells  ∆R<0.0375: 3x3 cells  For π±’s: From LAr TestBeam analysis, the cluster size for pions:  7x7 cells (∆R<0.12),  9x7 cells (∆R<0.16),  11x11 cells (∆R<0.20)… For the reconstruction of the clusters from π±’s:  ∆R <0.4  ∆R<0.2  ∆R <0.1  For neutrons: the shower of the neutrons must be so wide as the π±'s. So, in principle:  ∆R>0.1, ∆R<0.2 and ∆R<0.4
  • 17.
    ET Resolution withCone algorithms Always the best resolution is for ∆R<1.0, but it includes more than the shower of one particle. For π±’s the best resolution for TRACK-cone with ∆R<0.4, but with ∆R<0.2. I have also a good resolution and it let me a better definition of the shower of one π±. For neutrons: the best resolution with ∆R<0.4, but ∆R<0.2 is still very good resolution. In both cases, ∆R<0.1 is too strict to defined hadronic particles. For π0’s: Resolution with ∆R<0.1 is the better. Clusters with 7x3 and 5x3 cells gives us good resolution but not so good.3x3 is too strict. They could be useful when elect noise will be applied
  • 18.
    Clustering Algorithms Comparison The best algorithm for the reconstruction of the clusters from single particles at very low ET (without electronic noise) is, in each case:  For π±’s: Track-cone with ∆R<0.2 (Truth- cone is close but with ∆R<0.4)  For neutrons: Truth-cone with ∆R<0.2 in general, but TOPO with Seed_cut=4 and Neigh_cut=2 is very near and it’s better at 1and 3 GeV.  For π0’s: Truth-cone with ∆R<0.1.  EGAMMA-cluster give worse resolution, in general, than TOPO and Truth-cone, but gives the best resolution of all at 1 GeV. Anyway, the results from TOPO algorithm with Seed_cut=4 and Neigh_cut=2 are very competitive for neutrons and π0’s, for π±’s TOPO is a good algo but not enough, for the time being (it will be needed to test new versions of TopoCluster package in the newer release of Athena 8.2.0)
  • 19.
    INDEX 1)Compare Clustering Algorithmsin ATLAS 2)Lower threshold for Seed & Neighbor cells 3)Cone Algorithms 4)Topocluster analysis with Electronic Noise
  • 20.
    Topocluster analysis withElectronic Noise The energy deposited inside TopoCluster comes from the generated particles, but also from the electronic noise π±’s neu π0’s Asking for a minimum value of ET in Seed Cell and Neighbor cells: Seed Cell >200MeV Neighbor cells >80MeV a similar value of without noise is obtained. After these cuts, the size ot the Topocluster is up to 14 times smaller. This difference is more important for the EM calo because there the level of noise with respect to the signal is bigger.
  • 21.
    π±’s neu π0’s The ET resolution get worse with the application of these cuts there is a loss in energy reconstruction of the clusters. WHY? Because we have applied a general threshold to the ETcell for all calorimeter, and the electronic noise contribution is different in each layer of LAr and Tile. Seed Cell >200MeV Neighbor cells >80MeV
  • 22.
    Conclusions  WITHOUT NOISE:  The best E resolution for VLE particles is obtained with cone algorithms  TopoCluster is a very competitive algorithm but doing the changes:  Using CaloNoiseTool to model th eEM Noise  Applying lower thresholds to Seed and Neighbor cells:  SeedCut=4 and NeighborCut =2 TopoClusters is event better than EGamma cluster for π0’s.  WITH NOISE:  The E resolution get worse for TopoCluster  If we try to remove electronic noise, we also get a loss in energy from particles  It will be needed to applied ET thresholds in each layer of LAr and Tile