#import "importer/main.typ": *
#import "helpers.typ": *

= Halo mass history <halo_mass_history>

This section shows the impact of the halo growth on the resulting radiation profiles and motivates the need for a more precise treatment of the halo mass history. We show how to leverage simulation data for a refined simulation.
// Don't like refined simulation


== Modeling mass accretion

As described in @hmreio the fundamental assumption of #beorn is the halo model of reionization by @schneider2023cosmologicalforecast21cmpower.
// no need to recite?
It describes how observables of reionization can be parametrized in terms of the halo mass and more specifically its rate of change since they are derived from the star formation rate expressed in @eq:star_formation_rate.
In this simplified model, for a given star formation efficiency
#footnote[
  Note that the assumption of a fixed star formation efficiency or even an analytic expression as a function of halo mass is a simplification.
  // Citation
  The investigation of stochasticity has been subject to separate research (e.g. "missing").
],
the halo mass history is the single most impactful property besides the mass itself.


#beorn's goal is to provide simulations of the map-level contributions to the 21-cm signal, meaning that we cannot rely on a distribution of halo masses and accretion rates alone. Instead, #beorn leverages large scale N-body simulations to provide a spatial distribution of halos. For the first iteration halo growth was modeled through an exponential growth model
$
  M_"h" (z) = M_"h" (z_0) dot exp[-alpha (z - z_0)]
$ <eq:exponential_growth>
where $alpha = - dot(M_"h") / M_"h"$ is a free parameter describing the specific mass accretion rate. Following `@???`
// TODODO
a value of $alpha = 0.79$ was used as a fiducial value for all halos, independent of their mass or redshift. This meant that the requirements on the simulation data were minimal: Only a single halo catalog at a given redshift was required to generate a map at that redshift.

Using a simple exponential growth model is a significant simplification of the complex process of halo growth
// maybe a citation
but the most obvious
// maybe a better word
limitation is the assumption of a constant accretion rate $alpha$ for all halos, independently of their position, mass or redshift. In a realistic scenario we expect to observe a correlation with both the halo mass and redshift, in addition to the stochasticity of the accretion process. From a statistical perspective, this has been investigated by @Schneider_2021 who also consider a halo growth following the extended Press-Schechter formalism. This more detailed treatment shows that in particular small scales deviate from the simple exponential growth model. From a simulation perspective an even more precise treatment is possible since the growth history of each halo is already encoded in the successive snapshots of the N-body simulation. Ignoring this information introduces inconsistencies by painting halos using profiles that might not reflect their actual growth history.


// In a purely formal investigation where a qualitative prediction is derived from a well-defined halo mass distribution, the mass history is simply obtained as a direct derivation from the mass distribution. The simulations made by #beorn aim to provide 3D data that allows for quantitative conclusions. To this end a spatial distribution of the halo mass history is required, as provided by large scale simulations
// #footnote[
//   As described previously the halo model allows us to restrict the simulation to dark matter only, allowing for a more efficient simulation of the large scale structure.
// ]
// // Cite pkdgrav, Illustris, THESAN
// .

// @Schneider_2021 already compared exp growth to other models and found that following a more rigorous EPS approach show less growth at small masses.
// // e.g. papers like "2309...." suggest a revised halo mass growth.


== Effect on radiation profiles
#let notebook = json("../workdir/11_visualization/alpha_dependence_of_profiles.ipynb")

In order to illustrate the necessity of a more precise treatment of the halo mass history, we first investigate the effect of different mass accretion rates on the resulting radiation profiles. To this end we consider halos at fixed masses and vary their accretion rates around the fiducial value of $alpha = 0.79$.

#figure(
  image_cell(notebook, cell_id: "profile_plot_alpha_dependence"),
  caption: [
    Flux profiles around halos with varying accretion rates.
    _Left:_ Profile of the Lyman-$alpha$ coupling coefficient.
    _Center:_ Profile of the kinetic temperature $T_k$.
    _Right:_ Ionization fraction profile.
    The effect of different mass accretion rates is visualized by the color gradient where bluer colors correspond to lower accretion rates and redder colors to higher accretion rates.]

) <fig:profile_plot_alpha_dependence>

@fig:profile_plot_alpha_dependence shows the three relevant profiles, computed for $M_"h1" = ??$ and $M_"h2" = ??$. The variation of the accretion rate leads to noticeable differences in all three profiles, even at high radial distances. There is a clear and consistent trend for all three profiles: Higher accretion rates lead to higher fluxes, i.e. an effect that is more outreaching. This is expected as a higher accretion rate leads to a higher star formation rate and thus to the production of more photons.

// TODO - how far should I comment on that?
This picture is more complex once we consider a distribution of accretion rates instead of a single value.
// Do I need to show a plot of that as well?
We note that the dominating factor when considering a distribution is the contribution from the mean accretion rate. The scatter around the mean has a significantly smaller effect. We do not elaborate on the stochasticity of the accretion rate since the usage of #nbody simulations allows for a more sophisticated investigation. Instead of assuming pure stochasticity we can extract the actual growth history of each halo and use it to assign a more meaningful accretion rate.


== Merger trees

=== Using THESAN

In order to generate precise map-level predictions of the 21-cm signal, #beorn combines the halo model of reionization with large-scale #nbody simulations which provide realistic snapshots of the dark matter distribution. They give a spatial context to the generated profiles.

As described in @procedure #beorn was initially used to postprocess the #pkdgrav
// cite!
simulation suite and to obtain a meaningful signal capable of constraining astrophysical parameters related to star formation. The aim of this thesis is not to merely increase the precision but to leverage the mass history that can be extracted directly from the simulation to refine the underlying model.

To this end, we use the publicly available data from the #thesan simulation suite
#cite(<Kannan_2021>, form: "normal")
#cite(<Garaldi_2022>, form: "normal")
#cite(<Smith_2022>, form: "normal")
. The #thesandark simulation in particular provides a dark-matter-only simulation and already provides halo catalogs and merger trees generated by the `LHaloTree` tree builder by @Springel2005. This will allow us to extract the growth of each halo accross different snapshots without signifcant preprocessing.

With a box length of $95.5 "cMpc"$ the simulation provides a sufficient volume to avoid box size effects
// CITATION
while still allowing us to iterate quickly and test the refined model without excessive computational cost. The simulation has two variants with different mass resolutions:
#thesandark 1 with $2100^3$ particles for a mass resolution of $3.70 dot 10^6 M_dot.circle$ per particle and #thesandark 2 with $1050^3$ particles for a mass resolution of $2.96 dot 10^7 M_dot.circle$ per particle. Unless specified otherwise we use #thesandark 2 since it provides a good compromise between resolution and computational cost. We make use of #thesandark 1 to perform convergence tests as described in @validation.


// TODO - below
// @Kannan_2021 also shows that reionization history is different for different gas densitites, i.e. halo masses. We also show from a profile perspective that treating halo accretion as a free parameter can lead to significant differences in the resulting profiles.


// Thesan halo catalog and the motivation to increase the cutoff.

// At the same time THESAN low mass halos seem overabundant which is why we use boosted models of star formation efficiency.


=== Main progenitor branch
#let notebook = json("../workdir/11_visualization/show_trees.ipynb")

Growth of structure in #lambdacdm is hierarchical: Small structures form first and merge to form larger structures. The growth of halos can be represented using merger trees. These tree-like structures describe the halo history in terms of the mergers of its smaller progenitors. A merger tree is constructed by linking halos in consecutive snapshots of the simulation where each halo as a single descendant but potentially multiple progenitors.
// As described in ... THESAN

The main progenitor serves as a tracer of the halo mass history if we assume that the halo mass growth is dominated by mergers.
// Has this been explicitlyshown somewhere?
Beyond that, we expect the main progenitor to be most representative of the baryonic conditions inside and outside the halo as the merger occurs.
// Might need to reformulate
For the identification of accretion rates for #beorn we therefore focus solely on the main progenitor branch of each halo.

Reducing the breadth of the merger tree reduces the data volume significantly and allows us to implement the tree handling in memory without excessive computational cost. To this end, we provide a simple implementation of a tree walker that copies the simplified trees to a single file for easier access. Other preprocessing is not required which allows #beorn to keep all parameters related to the mass history as free parameters to be specified at runtime.


=== Fitting procedure
The restriction to the main progenitor corresponds to a reduction the dimensionality of the mass history to a one-dimensional function of redshift compatible with the orginal assumption of an exponential growth model as in @eq:exponential_growth.

#figure(
  image_cell(notebook, cell_id: "merger_tree_and_fitting"),
  caption: [
    Usage of merger tree fitting to obtain accretion rate estimates.
    _Left:_ Collection of normalized main progenitor branches with mass $M_"mp"$ starting at $z = 10.3$ and looking back over $n=10$ snapshots. Select histories and their corresponding exponential fits are highlighted.
    _Right:_ Distribution of best-fit accretion rates $alpha$ for all halos at $z=10.3$.
  ]

) <fig:merger_tree_and_fitting>

We use a linear regression in log-space to obtain estimates of the accretion rate $alpha$ for each halo. This is implemented in a vectorized fashion to allow for efficient processing of the full dataset. For this fit we enforce the current halo mass as a boundary condition. This prevents inconsistent fits where the latest fitted mass deviates from the actual current halo mass. As a visualization of the fitting procedure @fig:merger_tree_and_fitting shows a collection of normalized main progenitor branches starting at $z=10.3$ and looking back over $n=10$ snapshots. After fitting we overlay the estimated exponential growth history for a selection of halos. The right panel shows the distribution of best-fit accretion rates $alpha$ for all halos at $z=10.3$.

Similarly to the halo mass itself the accretion rate can then be taken into account during the painting procedure by selecting a profile corresponding to the halo mass and accretion rate of each halo. Hence the accretion rate is binned as well and the range that is covered during the painting is finite. We leave this as a free parameter to be specified at runtime.


== Resulting accretion rates
#let notebook = json("../workdir/11_visualization/evolution_of_alphas.ipynb")

#figure(
  image_cell(notebook, cell_id: "alpha_evolution_vs_redshift"),
  caption: [
    Evolution of the mean of the fitted accretion rates and the $1 sigma$ standard deviation (shaded area). For a given snapshot we consider different numbers of snapshots $n$.
  ]
) <fig:alpha_evolution_vs_redshift>

In order to obtain a sensible range of $alpha$ values to cover during the painting procedure, we investigate the global result of the fitting procedure. Our method of fitting trades speed and convenience for absolute precision: Not all halos are well represented in the merger tree. Additionally, we need to account for unphysical or incomplete histories due to limitations of the halo finder. We discuss this step in @implementation. For the current investigation we disregard these halos and only consider well-behaved, fully reolved trees. @fig:alpha_evolution_vs_redshift shows how the fitted accretion rate evolves when starting from the different snapshots. We plot the mean and $1 sigma$ standard deviation of the resulting distribution of $alpha$ values. We consider different lookback lengths with the goal of assessing the stability of the fitting procedure.

We observe a clear stabilization of the mean accretion for longer lookbacks. Not only does it make sense to consider longer lookbacks because of their causal connection, but also because it helps to absorb short-term fluctuations most likely introduced by the halo finder. This is especially noticeable in the first few snapshots where the $1 sigma$ uncertainty is significantly higher. This is likely due to the overabundance of low mass halos whose mass history is more erratic and harder to reconstruct, accentuated by displacements of the halos.

Numerically, the advantage of longer lookbacks is the stabilization of the fit leading to reduced scatter in the resulting distribution. We note that these behaviors stabilize once we consider around $n = 10$ snapshots of lookback. Both the mean and standard deviation follow a stable trend and the mean settles at $alpha approx 0.6$.
We attribute the slight offset of the means to the fact that discarding incomplete trees favors more massive halos at higher lookbacks. These halos are more stable in terms of detection by the halo finder and are expected to have fewer fluctuations.

Physically, the lookback time is motivated from the flux profiles of the halos themselves. Due to the size up to the $"Mpc"$ range (see e.g. @fig:profile_plot_alpha_dependence) we attribute to each profile a timescale that causally affects the region defined by that profile. For a profile of radius $ #sym.tilde.op 100 "Mpc"$ this time is on the order of $Delta t = 300 "Myr" #sym.arrow.l.r.double.long Delta z = 4$ (when looking back from a redshift of $z=8$). Given the spacing of snapshots in #thesan using $n=10$ snapshots still lies below the causal range. Since the fitted behavior seems to stabilize we suggest to not go beyond that since the consideration of additional snapshots slows down the simulation considerably.