[[TOC]]

= Spectrum Analysis =

== Theory ==

=== Spectrum ===

The differential flux \(\phi(E)\) per area, time and energy interval is defined as
\[\phi(E) = \frac{dN}{dA\cdot dt\cdot dE}\]

Often \(\phi(E)\) is also referred to as \(\frac{dN}{dE}\) as observation time and effective collection area is a constant.

For an observation with an effective observation time \(\Delta T=\sum\delta t_i\), this yields in a given Energy interval \(\Delta E\):
\[\phi(\Delta E) = \frac{1}{A_0\cdot \Delta T}\frac{N(\Delta E)}{\epsilon(\Delta E)\cdot \Delta E}\]

For simplicity, in the following, \(\Delta E\) will be replaced by just \(E\) but always refers to to a given energy interval. If data is binned in a histogram, the relation between the x-value for the bin \(E_\textrm{x}\) and the corresponding interval \(\Delta E_\textrm{x}\) is not well defined. Resonable definitions are the bin center (usually in logarithmic bins) or the average energy.

=== Efficiency ===

The total area \(A_0\) and the corresponding efficiency \(\epsilon(E)\) are of course only available for simulated data. For simulated data, \(A_0\) is the production area and \(\epsilon(E)\) the corresponding energy dependent efficiency of the analysis chain. For a given energy bin, the efficiency is then defined as 

\[\epsilon(E) = \frac{N_\textrm{exc}(E)}{N_0(E)} \]

where \(N_0\) is the number of simulated events in this energy bin and \(N=N_{exc}\) the number of *excess* events that are produced by the analysis chain.

Note that the exact calculation of the efficiency \(\epsilon(\Delta E)\) depends on prior knowledge of the correct source spectrum \(N_0(E)\). Therefore, it is strictly speaking only correct if the simulated spectrum and the real spectrum are identical. As the real spectrum is unknown, special care has to be taken of the systematic introduced by the assumption of \(N_0(E)\).

=== Effective Collection Area ===

The effective area is then defined as \(A_\textrm{eff}(E)=\epsilon(E)\cdot A_0\). Note that at large distances \(R_0\) the efficiency \(\epsilon(R_0)\) vanishes, so that the effective area is an (energy dependent) constant while \(A_0=\pi R_0^2\) and the efficiency \(\epsilon(E)\) are mutually dependent.

=== Excess and Error ===

The number of excess events, for data and simulations, is defined as

\[N_\textrm{exc} = N_\textrm{sig} - \hat N_\textrm{bg}\]

where \(N_\textrm{sig}\) is the number of events identified as potential gammas from the source direction ('on-source') and \(N_\textrm{bg}\) the number of gamma-like events measured 'off-source'. Note that for Simulations, \(\hat N_\textrm{bg}\) is not necessarily zero for wobble-mode observations as an event can survive the analysis for on- and off-events, if this is not prevented by the analysis (cuts).

The average number of background events \(\hat N_\textrm{bg}\) is the total number of background events \(N_\textrm{bg}\) from all off-regions times the corresponding weight \(\frac{1}{5}\) (often referred to as \(\alpha\)). For five off-regions, this yields

\[\hat N_\textrm{bg} = \frac{N_\textrm{bg}}{5}\]

Assuming Gaussian errors, the statistical error is thus

\[\sigma^2(N_\textrm{exc}) = \left(\frac{dN_\textrm{exc}}{dN_\textrm{sig}}\right)^2\sigma^2(N_\textrm{sig}) +  \left(\frac{dN_\textrm{exc}}{d\hat N_\textrm{bg}}\right)^2\sigma^2(\hat N_\textrm{bg})= \sigma^2(N_\textrm{sig}) + \frac{1}{5^2}\sigma^2(N_\textrm{bg})\]

For data this immediately resolves to 

\[\sigma^2(N_\textrm{exc}) = N_\textrm{sig} + \frac{1}{5^2}N_\textrm{bg} \]

with the Poisson (counting) error \(\sigma^2(N_\textrm{sig,bg}) = N_\textrm{sig,bg}\).

=== Weights and Error ===

In the following \(N\) refers to a number of simulated events and \(M\) to a number of measured (excess) events.


\(n(\delta E, \delta\theta)\) is the number of produced events in the energy interval \(E\in\delta E=[e_\textrm{min};e_\textrm{max}]\) and the zenith angle interval \(\theta\in\delta\theta=[\theta_\textrm{min};\theta_\textrm{max}]\). The weighted number of events \(n'(\delta E, \delta\theta)\) in that interval is then

\[n'(\delta E, \delta\theta) = \sum_{i=0...n}^{E\in\delta E}\sum_{j=0...n}^{\theta\in\delta\theta} \omega(E_i, \theta_j)\]

Since \(\omega(E, \theta) = \rho(E)\cdot \tau(\theta) \) with \(\rho(E)\) the spectral weight to adapt the spectral shape of the simulated spectrum to the real (measured) spectrum of the source and \(\tau(\theta)\) the weight to adapt to oberservation time versus zenith angle.

\[n'(\delta E, \delta\theta) = \sum_{i=0...n}^{E\in\delta E}\sum_{j=0...n}^{\theta\in\delta\theta} \rho(E_i)\cdot\tau(\theta_j) = \sum_{i=0...n}^{E\in\delta E}\rho(E_i)\cdot\sum_{j=0...n}^{\theta\in\delta\theta} \tau(\theta_j)\]

The weighted number of produced events \(N'(\Delta E, \Delta\Theta)\) in the total energy interval \(E\in\Delta E=[E_\textrm{min};E_\textrm{max}]\) and the total zenith angle interval \(\theta\in\Delta\Theta=[\Theta_\textrm{min};\Theta_\textrm{max}]\) is then

\[N'(\Delta E, \Delta\Theta) = \sum_{i=0...N}^{E\in\Delta E}\sum_{j=0...N}^{\theta\in\Delta\Theta} \rho(E_i)\cdot\tau(\theta_j) = \sum_{i=0...N}^{E\in\Delta E}\rho(E_i)\cdot\sum_{j=0...N}^{\theta\in\Delta\Theta} \tau(\theta_j)\]

with \(N(\Delta E, \Delta\Theta)\) being the total number of produced events.

For a sum of weights, e.g. \(N' = \sum_N \rho_i\tau_i\) the corresponding error is
\[\sigma^2(N')^2 = \sum_N\left[\rho_i\cdot\sigma(\tau_i)+\tau_i\cdot\sigma(\rho_i)\right]^2\] 

As the energy is well defined, \(\sigma(\rho_i)=0\) and thus

\[\sigma^2(N')^2 = \sum_N\rho_i^2\cdot\sigma^2(\tau_i)\] 


The weights are defined as follow:

\[\rho(E) = \rho_0\frac{\phi_\textrm{src}(E)}{\phi_0(E)}\]

where \(\phi_0(E)\) is the simulated spectrum and \(\phi_\textrm{src}\) the (unknown) real source spectrum. \(\rho_0\) is a normalization constant. The zenith angle weights \(\tau(\delta\theta)\) in the interval \(\delta\theta\) are defined as

\[\tau(\delta\theta) = \tau_0\frac{\Delta T(\delta\theta)}{N(\delta\theta)}\]

where \(N(\delta\theta)\) is the number of produced events in the interval \(\delta\theta\) and \(\Delta T(\delta\theta)\) is the total observation time in the same zenith angle interval. \(\tau_0\) is the normalization constant. The error on the weight \(\tau=\tau(\delta\theta)\) in each individual \(\theta\)-bin with \(\Delta T=\Delta T(\delta\theta)\) and \(N=N(\delta\theta)\)is then

\[\sigma^2(\tau) = \left[\frac{d\tau}{d\Delta T}\sigma(\Delta T)\right]^2+\left[\frac{d\tau}{dN}\sigma(N)\right]^2= \tau^2\cdot\left[\left(\frac{\sigma(\Delta T)}{\Delta T}\right)^2+\left(\frac{\sigma(N)}{N}\right)^2\right]\]

While \(\sigma(\Delta T)/\Delta T \approx 1\textrm{s}/5\textrm{min}\) is given by the data acquisition and 1s per 5min run, \(\sigma(N)/N=1/\sqrt{N}\) is just the statistical error of the number of events.

As the efficiency \(\epsilon(\delta E)\) for an energy interval \(\delta E\) is calculated as

\[\epsilon(\delta E) = \epsilon(\delta E, \Delta\Theta) = \frac{N'_\textrm{exc}(\delta E, \Delta\Theta)}{N'_\textrm{src}(\delta E,\Delta\Theta)}\]

and \(N'_\textrm{exc}(\delta E,\Delta\Theta)\) and \(N'_\textrm{src}(\delta E,\Delta\Theta)\) are both expressed as the sum given above, the constants \(\rho_0\) and \(\tau_0\) cancel. 

The differential flux in an energy interval \(\delta E\) is then given as

\[\phi(\delta E) = \phi(\delta E,\Delta\Theta) = \frac{1}{A_0\cdot \Delta T}\frac{M_\textrm{exc}(\delta E)}{\epsilon(\delta E)\cdot \delta E} = \frac{M_\textrm{exc}(\delta E)}{N'_\textrm{exc}(\delta E)}\cdot \frac{N'_\textrm{src}(\delta E)}{A_0\cdot\Delta T\cdot \delta E}\]



Where \(A_0\) is total area of production and \(\Delta T\) the total observation time. The number of measured excess events is in that energy interval is \(M_\textrm{exc}(\delta E)=M_\textrm{exc}(\delta E, \Delta\Theta)\).

Using Gaussian error propagation, the error in a given energy interval \(\delta\) is then given by


\[\sigma^2(\phi) = \left(\frac{1}{A_0\cdot \Delta T\cdot\delta E}\right)^2\cdot\left[\left(\frac{d\phi'}{dM_\textrm{exc}}\right)^2\sigma^2(M_\textrm{exc}) + \left(\frac{d\phi'}{dN'_\textrm{exc}}\right)^2\sigma^2(N'_\textrm{exc}) + \left(\frac{d\phi'}{dN'_\textrm{src}}\right)^2\sigma^2(N'_\textrm{src})\right]\]

with \(\phi':=A_0\cdot\Delta T\cdot\delta E\cdot\phi\).


\[\rightarrow\quad\sigma^2(\phi) = \phi^2 \cdot\left[\left(\frac{\sigma(M_\textrm{exc})}{M_\textrm{exc} }\right)^2 + \left(\frac{\sigma(N'_\textrm{exc})}{N'_\textrm{exc}}\right)^2 + \left(\frac{\sigma(N'_\textrm{src})}{N'_\textrm{src}}\right)^2\right]\]

== Conceptual Example ==

=== Define Binnings ===

=== Get Data File List ===

A list with file IDs containing the events to be analyed is required a.t.m. The following query retrieves such a list and fills a temporary table (`DataFiles`) with the IDs.

{{{#!Spoiler
{{{#!sql
CREATE TEMPORARY TABLE DataFiles
(
   FileId INT UNSIGNED NOT NULL,
   PRIMARY KEY (FileId) USING HASH
) ENGINE=Memory
AS
(
   SELECT
      FileId
   FROM
      factdata.RunInfo
   WHERE
      %0:where
   ORDER BY
      FileId
)
}}}
}}}

`%0:where` is a placeholder, for example for 

{{{#!sql
fZenithDistanceMean<30 AND fThresholdMinSet<350 AND 
fSourceKEY=5 AND 
fRunTypeKEY=1 AND 
fNight>20161201 AND fNight<20170201 AND
fR750Cor>0.9e0*fR750Ref
}}}

=== Get Observation Time ===

The following query bins the effective observation time of the runs listed above in zenith angle bins and stores the result in a temporary table (`ObservationTime`). Note that the result contains only those bins which have entries.

{{{#!Spoiler
{{{#!sql
CREATE TEMPORARY TABLE ObservationTime
(
   `.theta` SMALLINT UNSIGNED NOT NULL,
   OnTime FLOAT NOT NULL,
   PRIMARY KEY (`.theta`) USING HASH
) ENGINE=Memory
AS
(
   SELECT
      INTERVAL(fZenithDistanceMean, %0:bins) AS `.theta`,
      SUM(TIME_TO_SEC(TIMEDIFF(fRunStop,fRunStart))*fEffectiveOn) AS OnTime
   FROM
      DataFiles
   LEFT JOIN
      factdata.RunInfo USING (FileId)
   GROUP BY
      `.theta`
   ORDER BY
      `.theta`
)
}}}
}}}

`%0:bins` is a placeholder for the bin boundaries, e.g. `5, 10, 15, 20, 25, 30` (five bins between 5° and 30° plus underflow and overflow).

=== Get Monte Carlo File List ===

The next query obtains all Monte Carlo runs which have their !ThetaMin or !ThetaMax within one of the bins obtained in the previous query (so all MC runs that correspond to bins in which data is available). Strictly speaking, this step is not necessary, but it accelerats further processing. In addition (here as an example) only runs with even FileIDs are obtained as test-runs (assuming that odd runs were used for training). The resulting FileIDs are stored in a temporary table (`MonteCarloFiles`).

{{{#!Spoiler
{{{#!sql
CREATE TEMPORARY TABLE MonteCarloFiles
(
   FileId INT UNSIGNED NOT NULL,
   PRIMARY KEY (FileId) USING HASH
) ENGINE=Memory
AS
(
   SELECT
      FileId
   FROM
      ObservationTime
   LEFT JOIN
      BinningTheta ON `.theta`=bin
   LEFT JOIN
      factmc.RunInfoMC
   ON
      (ThetaMin>=lo AND ThetaMin<hi) OR (ThetaMax>lo AND ThetaMax<=hi)
   WHERE
      PartId=1 AND
      FileId%%2=0
   ORDER BY
      FileId
)
}}}
}}}


=== Get Zenith Angle Histogram ===

The following table creates a temporaray table (`EventCount`) internally which bins the !MonetCarlo files from the file list in !MonteCarloFiles in zenith angle bins. This temporary table ois then joined with table containing the binning for the data files (!EventCount) and for each bin, the ratio (!ZdWeight) and the corresponding error (!ErrZdWeight) is calculated (assuming an error on the on-time of 1s per 5min). To have also the in edges in the same table, the binning is joined as well.

{{{#!Spoiler
{{{#!sql
CREATE TEMPORARY TABLE ThetaHist
(
   `.theta`    SMALLINT UNSIGNED NOT NULL,
   lo          DOUBLE            NOT NULL COMMENT 'Lower edge of zenith distance bin in degree',
   hi          DOUBLE            NOT NULL COMMENT 'Upper edge of zenith distance bin in degree',
   CountN      INT UNSIGNED      NOT NULL,
   OnTime      FLOAT             NOT NULL,
   ZdWeight    DOUBLE            NOT NULL COMMENT 'tau(delta theta)',
   ErrZdWeight DOUBLE            NOT NULL COMMENT 'sigma(tau)',
   PRIMARY KEY (`.theta`) USING HASH
) ENGINE=Memory
AS
(
   WITH EventCount AS
   (
      SELECT
         INTERVAL(DEGREES(Theta), %0:bins) AS `.theta`,
         COUNT(*) AS CountN
      FROM
         MonteCarloFiles
      LEFT JOIN
         factmc.OriginalMC USING(FileId)
      GROUP BY
         `.theta`
   )
   SELECT
      `.theta`, lo, hi,
      CountN,
      OnTime,
      OnTime/CountN AS ZdWeight,
      (OnTime/CountN)*SQRT(POW(1/300, 2) + 1/CountN) AS ErrZdWeight
   FROM
      ObservationTime
   LEFT JOIN
      EventCount USING(`.theta`)
   LEFT JOIN
      BinningTheta ON `.theta`=bin
   ORDER BY
      `.theta`
)
}}}
}}}

`%0:bins` is a placeholder for the bin boundaries, e.g. `5, 10, 15, 20, 25, 30` (five bins between 5° and 30° plus underflow and overflow). It should be identical to the binning used for data files in !DataFiles.


=== Analyze Data ===

=== Analyze Monte Carlo Data ===

=== Summarize Corsika Production ===

{{{#!Spoiler
{{{#!sql
CREATE TEMPORARY TABLE SimulatedSpectrum
(
   `.energy` SMALLINT UNSIGNED NOT NULL COMMENT 'Bin Index [MC Energy]',
   CountN    DOUBLE            NOT NULL,
   CountW    DOUBLE            NOT NULL,
   CountW2   DOUBLE            NOT NULL,
   PRIMARY KEY (`.energy`) USING HASH
) ENGINE=Memory
AS
(
   SELECT
      INTERVAL(LOG10(Energy), %0:energyest) AS `.energy`,
      COUNT(*) AS CountN,
      SUM(    (%2:spectrum)/pow(Energy, SpectralIndex)   ) AS CountW,
      SUM(POW((%2:spectrum)/pow(Energy, SpectralIndex),2)) AS CountW2
   FROM
      MonteCarloFiles
   LEFT JOIN
      factmc.RunInfoMC USING (FileId)
   LEFT JOIN
      factmc.OriginalMC USING (FileId)
   GROUP BY
      `.energy`
   ORDER BY
      `.energy`
)
}}}
}}}


=== Result (Spectrum) ===

=== Result (Threshold) ===

=== Result (Migration) ===