master-thesis-report/halo_mass_history.typ

#import "importer/main.typ": *
#import "helpers.typ": *

= Halo mass history <halo_mass_history>

This section shows the impact of the halo growth on the resulting radiation profiles and motivates the need for a more precise treatment of the halo mass history. We show how to leverage simulation data for an improved consistency during the simulation.


== Modeling mass accretion

As described in @hmreio the fundamental assumption of #beorn is the halo model of reionization by @schneider2023cosmologicalforecast21cmpower.
// no need to recite?
It describes how observables of reionization can be parametrized in terms of the halo mass and more specifically its rate of change since they are derived from the star formation rate expressed in @eq:star_formation_rate.
In this simplified model, for a given star formation efficiency the halo mass history is the single most impactful property besides the mass itself.


#beorn's goal is to provide simulations of the map-level contributions to the 21-cm signal, meaning that we cannot rely on a distribution of halo masses and accretion rates alone. Instead, #beorn leverages large scale N-body simulations to provide a spatial distribution of halos. Halo growth is considered to follow the exponential growth model
$
  M_"h" (z) = M_"h" (z_0) dot exp[-alpha (z - z_0)]
$ <eq:exponential_growth>
where $alpha = - dot(M_"h") / M_"h"$ is a free parameter describing the specific mass accretion rate. Following @10.1093-mnras-stt1338
a value of $alpha = 0.79$ was assumed for all halos, independent of their mass or redshift in the initial version. This means that the requirements for the simulation data were minimal: Only a single halo catalog at a given redshift was required to generate a map at that redshift.

Using a simple exponential growth model is a significant simplification of the complex and time-sensitive process of halo growth (e.g. @McBride_2009). Another limitation is the assumption of a constant accretion rate $alpha$ for all halos, independently of their position, mass or redshift. In a realistic scenario we expect to observe significant stochasticity of the accretion process as well as systematic effects from the halo mass and the redshift. From a statistical perspective, this has been investigated by @Schneider_2021 who also consider a halo growth following the extended Press-Schechter formalism. This more detailed treatment shows that in particular small scales deviate from the simple exponential growth model. From a simulation perspective, an even more precise treatment is possible since the growth history of each halo is already encoded in the successive snapshots of the N-body simulation. Ignoring this information introduces inconsistencies by painting halos using profiles that might not reflect their actual growth history.


== Effect on radiation profiles
#let notebook = json("../workdir/11_visualization/alpha_dependence_of_profiles.ipynb")

To illustrate the necessity of a more precise treatment of the halo mass history, we first investigate the effect of different mass accretion rates on the resulting radiation profiles. To this end, we consider halos at fixed masses and vary their accretion rates around the fiducial value of $alpha = 0.79$.

#figure(
  image_cell(notebook, cell_id: "profile_plot_alpha_dependence"),
  caption: [
    Flux profiles around halos with varying accretion rates.
    _Left:_ Profile of the Lyman-$alpha$ coupling coefficient $rho_alpha$.
    _Center:_ Profile of the heating profile $rho_h$.
    _Right:_ Ionization fraction profile $x_"HII"$.
    The effect of different mass accretion rates is visualized by the color gradient where bluer colors correspond to lower accretion rates and redder colors to higher accretion rates.]

) <fig:profile_plot_alpha_dependence>

@fig:profile_plot_alpha_dependence shows the three relevant profiles, computed for $M_"h" = 6.08 dot 10^11 M_dot.circle$. The variation of the accretion rate leads to noticeable differences in all three profiles, even at high radial distances. There is a clear and consistent trend for all three profiles: Higher accretion rates lead to higher fluxes, i.e. an effect that is more outreaching. This is expected as a higher accretion rate leads to a higher star formation rate and thus to the production of more photons. Note that the spread of the profiles seems symmetric. This is due to the logarithmic scaling of the plot and the higher range of alpha values above the fiducial value. In other words, the magnitude of the effect seems to be the same when shifting the accretion above or below the fiducial value.

// TODO - how far should I comment on that?
This picture is more complex once we consider a distribution of accretion rates instead of a single value across all halos. We note that the dominating factor when considering a distribution is the contribution from the mean accretion rate. The scatter around the mean has a significantly smaller effect. We do not elaborate on the stochasticity of the accretion rate since the usage of #nbody simulations allows for a more sophisticated investigation. Instead of assuming pure stochasticity we can extract the actual growth history of each halo and use it to assign a more meaningful accretion rate.


== Merger trees

=== Using #thesan simulation data

In order to generate precise map-level predictions of the 21-cm signal, #beorn combines the halo model of reionization with large-scale #nbody simulations which provide realistic snapshots of the dark matter distribution. They give a spatial context to the generated profiles.

As described in @procedure #beorn was initially used to postprocess the #pkdgrav
#cite(<potter2016pkdgrav3trillionparticlecosmological>, form: "normal")
simulation suite and to obtain a meaningful signal capable of constraining astrophysical parameters related to star formation. The aim of this thesis is not to merely increase the precision but to develop a proof of concept that leverages the mass history which can be extracted directly from the simulation to refine the underlying model.

To this end, we use the publicly available data from the #thesan simulation suite
#cite(<Kannan_2021>, form: "normal")
#cite(<Garaldi_2022>, form: "normal")
#cite(<Smith_2022>, form: "normal")
. The #thesandark simulation in particular provides a dark-matter-only simulation that conveniently already includes halo catalogs and merger trees generated by the `LHaloTree` tree builder by @Springel2005. This will allow us to extract the growth of each halo across different snapshots without significant preprocessing.

With a box length of $95.5 "cMpc"$ the simulation provides a sufficient volume to avoid box size effects
(e.g. @Iliev_2014)
while still allowing us to iterate quickly and test the refined model without excessive computational cost. The simulation has two variants with different mass resolutions:
#thesandark 1 with $2100^3$ particles for a mass resolution of $3.70 dot 10^6 M_dot.circle$ per particle and #thesandark 2 with $1050^3$ particles for a mass resolution of $2.96 dot 10^7 M_dot.circle$ per particle. Unless specified otherwise we use #thesandark 2 since it provides a good compromise between good resolution and computational cost. We make use of #thesandark 1 to perform convergence tests as described in @validation.


=== Main progenitor branch
#let notebook = json("../workdir/11_visualization/show_trees.ipynb")

Growth of structure in #lambdacdm is hierarchical: Small structures form first and merge to form larger structures. The growth of halos can be represented using merger trees. These tree-like structures describe the halo history in terms of the mergers of its smaller progenitors. A merger tree is constructed by linking halos in consecutive snapshots of the simulation where each halo as a single descendant but potentially multiple progenitors.
// As described in ... THESAN

The main progenitor is the most massive progenitor and serves as a tracer of the halo mass history if we assume that the halo mass growth is dominated by mergers.
// Has this been explicitlyshown somewhere?
Beyond that, we expect the main progenitor to be most representative of the baryonic conditions inside and outside the halo as the merger occurs.
// Might need to reformulate
For the identification of accretion rates for #beorn we therefore focus solely on the main progenitor branch of each halo.

Reducing the breadth of the merger tree reduces the data volume significantly and allows us to implement the tree handling in memory without excessive computational cost. For this purpose we provide a simple implementation of a tree walker that copies the simplified trees to a single file for easier access. Other preprocessing is not required, which allows #beorn to keep all parameters related to the mass history as free parameters to be specified at runtime.


=== Fitting procedure
The restriction to the main progenitor corresponds to a reduction of the dimensionality of the mass history to a one-dimensional function of redshift compatible with the orginal assumption of an exponential growth model as in @eq:exponential_growth.

#figure(
  image_cell(notebook, cell_id: "merger_tree_and_fitting"),
  caption: [
    Usage of merger tree fitting to obtain accretion rate estimates.
    _Left:_ Collection of normalized main progenitor branches with mass $M_"mp"$ starting at $z = 10.3$ and looking back over $n=10$ snapshots. Select histories and their corresponding exponential fits are highlighted.
    _Right:_ Distribution of best-fit accretion rates $alpha$ for all halos at $z=8.29$.
  ]

) <fig:merger_tree_and_fitting>

We use a linear regression in log-space to obtain estimates of the accretion rate $alpha$ for each halo. This is implemented in a vectorized fashion to allow for efficient processing of the full dataset. For this fit we enforce the current halo mass as a boundary condition. This prevents inconsistent fits where the latest fitted mass deviates from the actual current halo mass. As a visualization of the fitting procedure @fig:merger_tree_and_fitting shows a collection of normalized main progenitor branches starting at $z=8.29$ and looking back over $n=10$ snapshots. After fitting we overlay the estimated exponential growth history for a selection of halos. The right panel shows the distribution of best-fit accretion rates $alpha$ for all halos at $z=8.29$. Given the relative low mass of the halos we observe a strong clustering of accretion rates around a value of $alpha approx 0.5$. Outliers with significantly deviating values appear nevertheless and are not linked to a specific mass range.

Similarly to the halo mass itself, the accretion rate can then be taken into account during the painting procedure by selecting a profile corresponding to the halo mass and accretion rate of each halo. Consequently, the accretion rate is binned as well and the range that is covered during the painting is finite. We leave this as a free parameter to be specified at runtime.


== Resulting accretion rates
#let notebook = json("../workdir/11_visualization/evolution_of_alphas.ipynb")

#figure(
  image_cell(notebook, cell_id: "alpha_evolution_vs_redshift"),
  caption: [
    Evolution of the mean of the fitted accretion rates and the $1 sigma$ standard deviation (shaded area). For a given snapshot we consider different numbers of lookback snapshots $n$.
  ]
) <fig:alpha_evolution_vs_redshift>

In order to obtain a sensible range of $alpha$ values to cover during the painting procedure, we investigate the global result of the fitting procedure. Our method of fitting trades speed and convenience for absolute precision: Not all halos are well represented in the merger tree and no further processing is done beyond the consideration of the tree. Additionally, we need to account for unphysical or incomplete histories due to limitations of the halo finder. We discuss this step in @implementation. For the current investigation we disregard these halos and only consider well-behaved, fully resolved trees. @fig:alpha_evolution_vs_redshift shows how the fitted accretion rate evolves when starting from the different snapshots. We plot the mean and $1 sigma$ standard deviation of the resulting distribution of $alpha$ values. We consider different lookback lengths with the goal of assessing the stability of the fitting procedure.

We observe a clear stabilization of the mean accretion for longer lookbacks. Not only does it make sense to consider longer lookbacks because of their causal connection, but also because it helps to absorb short-term fluctuations most likely introduced by the halo finder. This is especially noticeable in the first few snapshots where mean and $1 sigma$ uncertainty are significantly higher. This is probably due to the overabundance of low mass halos whose mass history is more erratic and harder to reconstruct.

Numerically, the advantage of longer lookbacks is the stabilization of the fit leading to reduced scatter in the resulting distribution. We note that these behaviors stabilize once we consider around $n = 10$ snapshots of lookback. Both the mean and standard deviation follow a stable trend and the mean settles at $alpha approx 0.6$.
We attribute the slight offset of the means to the fact that discarding incomplete trees favors more massive halos at higher lookbacks. These halos are more stable in terms of detection by the halo finder and are expected to have fewer fluctuations.

Physically, the lookback time is motivated from the flux profiles of the halos themselves. Due to their size in the $"cMpc"$ range (see e.g. @fig:profile_plot_alpha_dependence) we attribute to each profile a timescale during which there is a causal effect on the region defined by the extent of that profile. For a profile of radius $#sym.tilde.op 100 "cMpc"$ this time is of the order of $Delta t = 300 "Myr"$, corresponding to $Delta z = 4$ (when looking back from a redshift of $z=8$). Given the spacing of snapshots in #thesan using $n=10$ snapshots still lies below the causal range. Since the fitted behavior seems to stabilize we suggest to not go beyond that since the consideration of additional snapshots slows down the simulation considerably.