master-thesis-report/implementation.typ

#import "helpers.typ": *

= Implementation of changes <implementation>

This section describes the adaptations that were necessary in order to utilize the individual treatment of halo mass accretion histories in #beorn. We distinguish between necessary changes that were required to implement the underlying model and secondary changes that affect the quality of the simulation outputs indirectly.

== Profile generation taking into account halo mass history
For each halo we require a flux profile that matches the halo properties which now include the accretion rate additionally to the mass and the redshift. The profiles are generated in a preprocessing step following the redshifts of the snapshots and the mass and accretion bins defined in the configuration.
// Maybe reformulate
Since the dynamic range of accretion rates is large, the resulting parameter space rapidly expands. The computation of the profiles therefore utilizes vectorized operations to achieve reasonable runtimes.

Note that the precomputation of the profiles introduces another "second degree" inconsistency: The flux profile attributes a radiative behavior to the halo that is motivated by its history. This is repeated for each snapshot creating possibly conflicting histories. In the case of stable halo growth this is not a problem but in the case of erratic growth (e.g. major mergers) this can lead to unphysical behavior. A more consistent approach would be to assume a more flexible mass growth model that distinguishes different regimes of growth. This would require a much more complex handling of the precomputed profiles and is beyond the scope of this work. The current approach remains a good approximation for the majority of halos.


== Parallel painting of profile bins
Similarly to the computation of profiles, the painting step is affected by the increased parameter space. #beorn's fast simulation times revolve around the crucial simplification of the halo model: Halos with the same core properties are treated identically and can be mapped onto the grid in a single operation (see @painting). Through the addition of the accretion rate as a parameter the degeneracy of identical halos is reduced. The number of halos that can be treated simultaneously decreases even though they have the same mass. To mitigate this effect we implement a parallelized version of the painting step that distributes the workload to multiple processes
#footnote[
  A rudimentary parallel implementation using the message passing interface (`MPI`) already exists. It leverages the fact that each snapshot can be processed independently and distributes the snapshots to multiple processes.
].
This implementation utilizes a shared memory approach and uses processes on a single node that share a common memory space to store the grid. This allows for a more efficient usage of node resources since the memory overhead of duplicating the grid for each process is avoided. The required pre- and postprocessing that ensure the correct execution of all processes are justified by the performance gain which is nearly linear with the number of processes used
#footnote[
  We test the scaling with a parallelization up to 70 processes and observe a continuous speedup. Part of this speedup is absorbed by the overhead of the much larger number of bins.
].

Part of the painting procedure remains inherently sequential. For instance the final ionization map requires conservation of the total photon count. This is achieved by distributing duplicate ionizations from overlapping bubbles to neighboring cells. A parallel spreading approach might create new overlaps that would require further iterations to resolve. We therefore perform this step in a single process. We aim to keep these inefficient computations to a minimum.


== Merger tree processing
The central improvement of the simulation procedure is the consideration of the individual halo mass accretion histories during the painting and not just the assumption of a predefined value. As described in @halo_mass_history we utilize the merger trees provided by the #thesan simulation. The inference of the accretion rate is performed at runtime. Further preprocessing of the simulation is not required, only a single step that merges the individual tree files into a single file.

The generated $alpha$ values are binned as a result of the painting procedure and the permitted range is restricted as specified in the configuration. For our runs we find that an upper limit of $alpha = 5$ only affects a sub-percent fraction of halos. Many of these halos exhibit erratic growth suggesting that allowing for very high accretion rates is not physical.

The #thesan data provides a convenient way to iterate and refine the above procedure but is not without shortcomings. The merger trees are constructed in postprocessing and do not guarantee self-consistency of halo properties across multiple snapshots. This manifests itself through negative growth rates that cannot be represented in the current model. Furthermore the mass resolution of the #thesandark simulations is apparently too coarse to accurately resolve halos down to the atomic cooling limit of $M_"h" = 10^8 M_dot.circle$. This is an issue that becomes apparent in @validation where we compare the impact of the different mass resolutions. To account for this we follow the description of star formation efficiency employed by @Schaeffer_2023, picking a "boosted" model for the description of our halos. The resulting parameters for @eq:star_formation_efficiency are $f_(star,0) = 0.1$, $M_p = 2.8 times 10^(10) M_dot.circle$, $gamma_1 = 0.49$ and $gamma_2 = -0.61$.


== Secondary changes
In addition to the changes directly linked to the new accretion model we implement several improvements that lead to better usability and reproducibility of the simulation outputs.

We improve the input/output handling by implementing proper `HDF5` support and caching of intermediate results. This allows for a more efficient usage of disk space and faster loading times. It also enables the resumption of interrupted simulations.
The import of data from the original #nbody simulation has been generalized to a reference class to ensure modularity and easy adaptation to other simulations. This has been part of a larger overhaul of the codebase to improve modularity and readability. #beorn aims to be a flexible framework that produces fast results that the end user can customize to reflect their parameter choices. Usability is therefore a key aspect of the code design.

A general speedup from the cumulated effect of the above changes and code optimizations results in a faster painting procedure. A contribution to that speedup comes from the usage of the `pylians` package by @Pylians. It provides efficient implementations of the grid mapping of the individual particles. This additionally allows for a rigorous implementation of redshift space distortions (RSD) by utilizing the exact velocity information of each dark matter particle individually. Previous implementations of RSD in #beorn were based on approximations of the velocity field derived from the density field. The impact of RSD on the 21-cm signal has been discussed e.g. by @Ross_2021 but is not the focus of this work.