report completed and refined to my personal satisfaction

2025-09-15 03:26:26 +02:00
parent cafaf38ddf
commit 221bdcda07
14 changed files with 354 additions and 215 deletions
--- a/implementation.typ
+++ b/implementation.typ
@@ -9,34 +9,35 @@ For each halo we require a flux profile that matches the halo properties which n
 // Maybe reformulate
 Since the dynamic range of accretion rates is large, the resulting parameter space rapidly expands. The computation of the profiles therefore utilizes vectorized operations to achieve reasonable runtimes.

-// TODO - maybe put somewhere else
-// at least explain why it isn't a problem
-// Reformulate
-Note that this introduces another "second degree" inconsistency: The flux profile attributes a radiative behavior to the halo that is motivated by its history. This is repeated for each snapshot creating possibly conflicting histories. In the case of stable halo growth this is not a problem but in the case of erratic growth (e.g. major mergers) this can lead to unphysical behavior. A more consistent approach would be to assume a more flexible mass growth model that distinguishes different growth modes/regimes.
+// TODO - reformulate
+Note that this introduces another "second degree" inconsistency: The flux profile attributes a radiative behavior to the halo that is motivated by its history. This is repeated for each snapshot creating possibly conflicting histories. In the case of stable halo growth this is not a problem but in the case of erratic growth (e.g. major mergers) this can lead to unphysical behavior. A more consistent approach would be to assume a more flexible mass growth model that distinguishes different regimes of growth. This would require a much more complex handling of the precomputed profiles and is beyond the scope of this work. The current approach remains a good approximation for the majority of halos.


-== Parallel binned painting
-Similary to the computation of profiles, the painting step is affected by the increased parameter space. #beorn's fast simulation times revolve around the crucial simplification of the halo model: Halos with the same core properties are treated identically and can be mapped onto the grid in a single operation. Through the addition of the accretion rate as a parameter the degeneracy is reduced. The number of halos that can be treated simultaneously decreases, even though their mass is identical. To mitigate this effect we implement a parallelized version of the painting step that distributes the workload to multiple processes
+== Parallel painting of profile bins
+Similarly to the computation of profiles, the painting step is affected by the increased parameter space. #beorn's fast simulation times revolve around the crucial simplification of the halo model: Halos with the same core properties are treated identically and can be mapped onto the grid in a single operation (see @painting). Through the addition of the accretion rate as a parameter the degeneracy of identical halos is reduced. The number of halos that can be treated simultaneously decreases even though they have the same mass. To mitigate this effect we implement a parallelized version of the painting step that distributes the workload to multiple processes
 #footnote[
-  A rudimentary parallel implementation using `MPI` already exists. It leverages the fact that each snapshot can processed independently and distributes the snapshots to multiple processes.
+  A rudimentary parallel implementation using the message passing interface (`MPI`) already exists. It leverages the fact that each snapshot can be processed independently and distributes the snapshots to multiple processes.
+].
+This implementation utilizes a shared memory approach and uses processes on a single node that share a common memory space to store the grid. This allows for a more efficient usage of node resources since the memory overhead of duplicating the grid for each process is avoided. The required pre- and postprocessing that ensure the correct execution of all processes are justified by the performance gain which is nearly linear with the number of processes used
+#footnote[
+  We test the scaling with a parallelization up to 70 processes and observe a continuous speedup. Part of this speedup is absorbed by the overhead of the much larger number of bins.
 ].
-This implementation utilizes a shared memory approach and uses processes on a single node that share a common memory space to store the grid. This allows for a more efficient usage of node resources since the memory overhead of duplicating the grid for each process is avoided.

-Part of the painting procedure remains inherently sequential: The final ionization map requires conservation of the total photon count. This is achieved by distributing duplicate ionizations to neighboring cells.
-// Reformulate
-a parallel approach cannot guarantee perfect consistency. We aim to keep the single process computations to a minimum.
+Part of the painting procedure remains inherently sequential. For instance the final ionization map requires conservation of the total photon count. This is achieved by distributing duplicate ionizations from overlapping bubbles to neighboring cells. A parallel spreading approach might create new overlaps that would require further iterations to resolve. We therefore perform this step in a single process. We aim to keep these inefficient computations to a minimum.
+

 == Merger tree processing
 The central improvement of the simulation procedure is the consideration of the individual halo mass accretion histories during the painting and not just the assumption of a predefined value. As described in @halo_mass_history we utilize the merger trees provided by the #thesan simulation. The inference of the accretion rate is performed at runtime. Further preprocessing of the simulation is not required, only a single step that merges the individual tree files into a single file.

-The generated alphas are binned as a result of the painting procedure and the permitted range is restricted as specified in the configuration. For our runs we find that an upper limit of $alpha = 5$ only affects a sub-percent fraction of halos. Many of these halos exhibit erratic growth suggesting that allowing for very high accretion rates is not physical.
+The generated $alpha$ values are binned as a result of the painting procedure and the permitted range is restricted as specified in the configuration. For our runs we find that an upper limit of $alpha = 5$ only affects a sub-percent fraction of halos. Many of these halos exhibit erratic growth suggesting that allowing for very high accretion rates is not physical.

-The #thesan data provides a convenient way to iterate and refine the above procedure but is not without shortcomings. The merger trees are constructed in post-processing and do not guarantee self-consistency of halo properties accross multiple snapshots. This manifests itself through negative growth rates that cannot be represented in the current model. Furhtermore the mass resolution of the #thesandark simulations is apparently too coarse to accurately resolve halos down to the atomic cooling limit of $M_"h" = 10^8 M_dot.circle$. This is an issue that becomes apparent in @validation where we compare the impact of the different mass resolutions. To account for this we follow the description of star formation efficiency employed by @Schaeffer_2023 picking a "boosted" model for the description of our halos. The resulting parameters for @eq:star_formation_efficiency are $f_(star,0) = 0.1$, $M_p = 2.8 times 10^(10) M_dot.circle$, $g_1 = 0.49$ and $g_2 = -0.61$.
+The #thesan data provides a convenient way to iterate and refine the above procedure but is not without shortcomings. The merger trees are constructed in postprocessing and do not guarantee self-consistency of halo properties across multiple snapshots. This manifests itself through negative growth rates that cannot be represented in the current model. Furthermore the mass resolution of the #thesandark simulations is apparently too coarse to accurately resolve halos down to the atomic cooling limit of $M_"h" = 10^8 M_dot.circle$. This is an issue that becomes apparent in @validation where we compare the impact of the different mass resolutions. To account for this we follow the description of star formation efficiency employed by @Schaeffer_2023, picking a "boosted" model for the description of our halos. The resulting parameters for @eq:star_formation_efficiency are $f_(star,0) = 0.1$, $M_p = 2.8 times 10^(10) M_dot.circle$, $gamma_1 = 0.49$ and $gamma_2 = -0.61$.


 == Secondary changes
-Additionally to the changes directly linked to the new accretion model we implement several improvements that allow for better usability and reproducability of the simulation outputs.
+In addition to the changes directly linked to the new accretion model we implement several improvements that lead to better usability and reproducibility of the simulation outputs.

 We improve the input/output handling by implementing proper `HDF5` support and caching of intermediate results. This allows for a more efficient usage of disk space and faster loading times. It also enables the resumption of interrupted simulations.
-The import of data from the original #nbody simulation has been generalized to a reference class to ensure modularity and allow for easier extension to other simulations. This has been part of a larger overhaul of the codebase to improve modularity and readability.
-A general speedup from the cumulated effect of the above changes and code optimizations allows for a faster painting procedure. A contribution to that speedup comes from the ussage of `Pylians` by @Pylians. It provides efficient implementations in `C` of of the grid mapping of the individual particles. This additionally allows for a rigorous implementation of redshift space distortions (RSD) by utilizing the exact velocity information of each dark matter particle individually. Previous implementations of RSD in #beorn were based on approximations of the velocity field derived from the density field. The impact of RSD on the 21-cm signal has been discussed e.g. by @Ross_2021 but is not the focus of this work.
+The import of data from the original #nbody simulation has been generalized to a reference class to ensure modularity and easy adaptation to other simulations. This has been part of a larger overhaul of the codebase to improve modularity and readability. #beorn aims to be a flexible framework that produces fast results that the end user can customize to reflect their parameter choices. Usability is therefore a key aspect of the code design.
+
+A general speedup from the cumulated effect of the above changes and code optimizations results in a faster painting procedure. A contribution to that speedup comes from the usage of `Pylians` by @Pylians. It provides efficient implementations of the grid mapping of the individual particles. This additionally allows for a rigorous implementation of redshift space distortions (RSD) by utilizing the exact velocity information of each dark matter particle individually. Previous implementations of RSD in #beorn were based on approximations of the velocity field derived from the density field. The impact of RSD on the 21-cm signal has been discussed e.g. by @Ross_2021 but is not the focus of this work.