more results, first corrective fixes

2025-09-14 02:38:29 +02:00
parent c43b64ecb7
commit cafaf38ddf
14 changed files with 265 additions and 297 deletions
--- a/halo_mass_history.typ
+++ b/halo_mass_history.typ
@@ -7,7 +7,7 @@ This section shows the impact of the halo growth on the resulting radiation prof
 // Don't like refined simulation


-== Modelling mass accretion
+== Modeling mass accretion

 As described in @hmreio the fundamental assumption of #beorn is the halo model of reionization by @schneider2023cosmologicalforecast21cmpower.
 // no need to recite?
@@ -21,17 +21,19 @@ In this simplified model, for a given star formation efficiency
 the halo mass history is the single most impactful property besides the mass itself.


-#beorn's goal is to provide simulations of the map-level contributions to the $21 "cm"$ signal, meaning that we cannot rely on a distribution of halo masses and accretion rates alone. Instead #beorn leverages large scale N-body simulations to provide a spatial distribution of halos. For the first iteration halo growth was modelled through an exponential growth model
+#beorn's goal is to provide simulations of the map-level contributions to the 21-cm signal, meaning that we cannot rely on a distribution of halo masses and accretion rates alone. Instead, #beorn leverages large scale N-body simulations to provide a spatial distribution of halos. For the first iteration halo growth was modeled through an exponential growth model
 $
  M_"h" (z) = M_"h" (z_0) dot exp[-alpha (z - z_0)]
 $ <eq:exponential_growth>
-where $alpha = - dot(M_"h") / M_"h"$ is a free parameter describing the specific mass accretion rate. Following `@???` a value of $alpha = 0.79$ was used as a fiducial value for all halos, independent of their mass or redshift. This meant that the requirements on the simulation data were minimal: Only a single halo catalog at a given redshift was required to generate a map at that redshift.
+where $alpha = - dot(M_"h") / M_"h"$ is a free parameter describing the specific mass accretion rate. Following `@???`
+// TODODO
+a value of $alpha = 0.79$ was used as a fiducial value for all halos, independent of their mass or redshift. This meant that the requirements on the simulation data were minimal: Only a single halo catalog at a given redshift was required to generate a map at that redshift.

 Using a simple exponential growth model is a significant simplification of the complex process of halo growth
 // maybe a citation
 but the most obvious
 // maybe a better word
-limitation is the assumption of a constant accretion rate $alpha$ for all halos, independently of their position, mass or redshift. In a realistic scenario we expect to observe a correlation both with halo mass and redshift, in addition to the stochasticity of the accretion process. From a statistical perspective this has been investigated by @Schneider_2021 who also consider a halo growth following the extended Press-Schechter formalism. This more detailed treatment shows that in particular small scales deviate from the simple exponential growth model. From a simulation perspective an even more precise treatment is possible since the growth history of each halo is already encoded in the successive snapshots of the N-body simulation. Ignoring this information introduces inconsistencies by painting halos using profiles that might not reflect their actual growth history.
+limitation is the assumption of a constant accretion rate $alpha$ for all halos, independently of their position, mass or redshift. In a realistic scenario we expect to observe a correlation with both the halo mass and redshift, in addition to the stochasticity of the accretion process. From a statistical perspective, this has been investigated by @Schneider_2021 who also consider a halo growth following the extended Press-Schechter formalism. This more detailed treatment shows that in particular small scales deviate from the simple exponential growth model. From a simulation perspective an even more precise treatment is possible since the growth history of each halo is already encoded in the successive snapshots of the N-body simulation. Ignoring this information introduces inconsistencies by painting halos using profiles that might not reflect their actual growth history.


 // In a purely formal investigation where a qualitative prediction is derived from a well-defined halo mass distribution, the mass history is simply obtained as a direct derivation from the mass distribution. The simulations made by #beorn aim to provide 3D data that allows for quantitative conclusions. To this end a spatial distribution of the halo mass history is required, as provided by large scale simulations
@@ -41,8 +43,8 @@ limitation is the assumption of a constant accretion rate $alpha$ for all halos,
 // // Cite pkdgrav, Illustris, THESAN
 // .

-@Schneider_2021 already compared exp growth to other models and found that following a more rigorous EPS approach show less growth at small masses.
-// e.g. papers like "2309...." suggest a revised halo mass growth.
+// @Schneider_2021 already compared exp growth to other models and found that following a more rigorous EPS approach show less growth at small masses.
+// // e.g. papers like "2309...." suggest a revised halo mass growth.



@@ -67,7 +69,7 @@ In order to illustrate the necessity of a more precise treatment of the halo mas
 // TODO - how far should I comment on that?
 This picture is more complex once we consider a distribution of accretion rates instead of a single value.
 // Do I need to show a plot of that as well?
-We note that the dominating factor when considering a distribution is the contribution from the mean accretion rate - the scatter around the mean has a significantly smaller effect. We do not pursue the stochasticity of the accretion rate since the usage of n-body simulations allows for a more sophisticated investigation. Instead of assuming pure stochasticity we can extract the actual growth history of each halo and use it to assign a more meaningful accretion rate.
+We note that the dominating factor when considering a distribution is the contribution from the mean accretion rate. The scatter around the mean has a significantly smaller effect. We do not elaborate on the stochasticity of the accretion rate since the usage of #nbody simulations allows for a more sophisticated investigation. Instead of assuming pure stochasticity we can extract the actual growth history of each halo and use it to assign a more meaningful accretion rate.



@@ -75,19 +77,19 @@ We note that the dominating factor when considering a distribution is the contri

 === Using THESAN

-In order to generate precise map-level predictions of the 21cm signal, #beorn combines the halo model of reionization with large scale N-body simulations which provide realistic snapshots of the dark matter distribution. They give a spatial context to the generated profiles.
+In order to generate precise map-level predictions of the 21-cm signal, #beorn combines the halo model of reionization with large-scale #nbody simulations which provide realistic snapshots of the dark matter distribution. They give a spatial context to the generated profiles.

-As described in @procedure #beorn was initially used to post-process the #pkdgrav
+As described in @procedure #beorn was initially used to postprocess the #pkdgrav
 // cite!
-simulation suite and obtain a meaningful signal capable of constraining astrophysical parameters related to star formation. The aim of this thesis is not to merely increase the precision but to leverage the mass history that can be extracted directly from the simulation to refine the underlying model.
+simulation suite and to obtain a meaningful signal capable of constraining astrophysical parameters related to star formation. The aim of this thesis is not to merely increase the precision but to leverage the mass history that can be extracted directly from the simulation to refine the underlying model.

 To this end, we use the publicly available data from the #thesan simulation suite
 #cite(<Kannan_2021>, form: "normal")
 #cite(<Garaldi_2022>, form: "normal")
 #cite(<Smith_2022>, form: "normal")
-. The #thesandark simulation in particular provides a dark matter only simulation and already provides halo catalogs and merger trees generated by the `LHaloTree` tree builder by @Springel2005. This will allow us to extract the growth of each halo accross different snapshots without signifcant preprocessing.
+. The #thesandark simulation in particular provides a dark-matter-only simulation and already provides halo catalogs and merger trees generated by the `LHaloTree` tree builder by @Springel2005. This will allow us to extract the growth of each halo accross different snapshots without signifcant preprocessing.

-With a box length of $95.5 "cMpc"$ it provides a sufficient volume to avoid box size effects
+With a box length of $95.5 "cMpc"$ the simulation provides a sufficient volume to avoid box size effects
 // CITATION
 while still allowing us to iterate quickly and test the refined model without excessive computational cost. The simulation has two variants with different mass resolutions:
 #thesandark 1 with $2100^3$ particles for a mass resolution of $3.70 dot 10^6 M_dot.circle$ per particle and #thesandark 2 with $1050^3$ particles for a mass resolution of $2.96 dot 10^7 M_dot.circle$ per particle. Unless specified otherwise we use #thesandark 2 since it provides a good compromise between resolution and computational cost. We make use of #thesandark 1 to perform convergence tests as described in @validation.
@@ -95,41 +97,38 @@ while still allowing us to iterate quickly and test the refined model without ex


 // TODO - below
-@Kannan_2021 also shows that reionization history is different for different gas densitites, i.e. halo masses. We also show from a profile perspective that treating halo accretion as a free parameter can lead to significant differences in the resulting profiles.
+// @Kannan_2021 also shows that reionization history is different for different gas densitites, i.e. halo masses. We also show from a profile perspective that treating halo accretion as a free parameter can lead to significant differences in the resulting profiles.


-// The simulation has two main limitations: First, the mass resolution of $3.12 * 10^7 "M_⊙"$ means that halos below a mass of $10^9 "M_⊙"$ are not resolved. This is particularly relevant as these low mass halos are expected to contribute significantly to the ionizing photon budget at high redshifts #cite(<Kannan_2021>, form: "normal"). To account for this, we use boosted models of star formation efficiency as described in section <sf_efficiency>. Second, the simulation only provides snapshots down to a redshift of $z=5.5$. As reionization is expected to be completed by this time, this does not impact our results.
+// Thesan halo catalog and the motivation to increase the cutoff.

-Thesan halo catalog and the motivation to increase the cutoff.
-
-At the same time THESAN low mass halos seem overabundant which is why we use boosted models of star formation efficiency.
+// At the same time THESAN low mass halos seem overabundant which is why we use boosted models of star formation efficiency.



 === Main progenitor branch
 #let notebook = json("../workdir/11_visualization/show_trees.ipynb")

-Growth of structure in #lambdacdm is hierarchical: Small structures form first and merge to form larger structures. The growth of halos is reflected in merger trees. The central representation of halo mass evolution is given by merger trees.
-These tree-like structures describe the halo history in terms of the mergers of its smaller progenitors. A merger tree is constructed by linking halos in consecutive snapshots of the simulation where each halo as a single descendant but potentially multiple progenitors.
+Growth of structure in #lambdacdm is hierarchical: Small structures form first and merge to form larger structures. The growth of halos can be represented using merger trees. These tree-like structures describe the halo history in terms of the mergers of its smaller progenitors. A merger tree is constructed by linking halos in consecutive snapshots of the simulation where each halo as a single descendant but potentially multiple progenitors.
 // As described in ... THESAN

 The main progenitor serves as a tracer of the halo mass history if we assume that the halo mass growth is dominated by mergers.
-// Has this been shown to be true somewhere?
+// Has this been explicitlyshown somewhere?
 Beyond that, we expect the main progenitor to be most representative of the baryonic conditions inside and outside the halo as the merger occurs.
 // Might need to reformulate
 For the identification of accretion rates for #beorn we therefore focus solely on the main progenitor branch of each halo.

-Reducing the breadth of the merger tree reduces the data volume significantly and allows us to implement the tree handling in memory without excessive computational cost. To this end we provide a simple implementation of a tree walker that copies the simplified trees to a single file for easier access. Other preprocessing is not required which allows #beorn to keep all parameters related to the mass history as free parameters to be specified at runtime.
+Reducing the breadth of the merger tree reduces the data volume significantly and allows us to implement the tree handling in memory without excessive computational cost. To this end, we provide a simple implementation of a tree walker that copies the simplified trees to a single file for easier access. Other preprocessing is not required which allows #beorn to keep all parameters related to the mass history as free parameters to be specified at runtime.


 === Fitting procedure
-The restriction to the main progenitor means that we reduce the dimensionality of the mass history to a one-dimensional function of redshift compatible with the orginal assumption of an exponential growth model @eq:exponential_growth.
+The restriction to the main progenitor corresponds to a reduction the dimensionality of the mass history to a one-dimensional function of redshift compatible with the orginal assumption of an exponential growth model as in @eq:exponential_growth.

 #figure(
  image_cell(notebook, cell_id: "merger_tree_and_fitting"),
  caption: [
    Usage of merger tree fitting to obtain accretion rate estimates.
-    _Left:_ Collection of normalized main progenitor branches starting at $z = 10.3$ and looking back over $n=10$ snapshots. Select histories and their corresponding exponential fits are highlighted.
+    _Left:_ Collection of normalized main progenitor branches with mass $M_"mp"$ starting at $z = 10.3$ and looking back over $n=10$ snapshots. Select histories and their corresponding exponential fits are highlighted.
    _Right:_ Distribution of best-fit accretion rates $alpha$ for all halos at $z=10.3$.
  ]

@@ -137,10 +136,7 @@ The restriction to the main progenitor means that we reduce the dimensionality o

 We use a linear regression in log-space to obtain estimates of the accretion rate $alpha$ for each halo. This is implemented in a vectorized fashion to allow for efficient processing of the full dataset. For this fit we enforce the current halo mass as a boundary condition. This prevents inconsistent fits where the latest fitted mass deviates from the actual current halo mass. As a visualization of the fitting procedure @fig:merger_tree_and_fitting shows a collection of normalized main progenitor branches starting at $z=10.3$ and looking back over $n=10$ snapshots. After fitting we overlay the estimated exponential growth history for a selection of halos. The right panel shows the distribution of best-fit accretion rates $alpha$ for all halos at $z=10.3$.

-// TODO - determination of lookback
-How we determine lookback
-
-Similarly to the halo mass itself the accretion rate can then be taken into account during the painting procedure by selecting a profile corresponding to the halo mass and accretion rate of each halo. This means that the accretion rate is binned as well and the range that is covered during the painting is finite. We leave this as a free parameter to be specified at runtime.
+Similarly to the halo mass itself the accretion rate can then be taken into account during the painting procedure by selecting a profile corresponding to the halo mass and accretion rate of each halo. Hence the accretion rate is binned as well and the range that is covered during the painting is finite. We leave this as a free parameter to be specified at runtime.



@@ -150,18 +146,15 @@ Similarly to the halo mass itself the accretion rate can then be taken into acco
 #figure(
  image_cell(notebook, cell_id: "alpha_evolution_vs_redshift"),
  caption: [
-    #lorem(50)
+    Evolution of the mean of the fitted accretion rates and the $1 sigma$ standard deviation (shaded area). For a given snapshot we consider different numbers of snapshots $n$.
  ]
 ) <fig:alpha_evolution_vs_redshift>
-In order to obtain a sensible range of values to cover during the painting procedure, we investigate the global result of the fitting procedure. Our method of fitting trades speed and convenience for absolute precision: far from all halos are well represented in the merger tree. Many of the histories are unphysical or incomplete and we describe their treatment in @implementation. For the current investigation we disregard these halos and only consider well-behaved, fully reolved trees. @fig:alpha_evolution_vs_redshift shows how the fitted accretion rate evolves when starting from the different snapshots. We plot the mean and $1 sigma$ standard deviation of the resulting distribution of $alpha$ values. We consider different lookback lengths with the goal of assessing the stability of the fitting procedure.

-We observe a clear stabilization of the mean accretion for longer lookbacks. Not only does it make sense to consider longer lookbacks because of their causal connection but this also helps to absorb short term fluctuations introduced most likely by the halo finder. Another advantage of longer lookbacks is the reduced scatter in the resulting distribution which is due to the removal of incomplete, highly fluctuating trees. We note that these behaviors stabilize once we consider aroun 10 snapshots of lookback. Both the mean and standard deviation follow a stable trend and the mean settles at $alpha approx 0.6$.
+In order to obtain a sensible range of $alpha$ values to cover during the painting procedure, we investigate the global result of the fitting procedure. Our method of fitting trades speed and convenience for absolute precision: Not all halos are well represented in the merger tree. Additionally, we need to account for unphysical or incomplete histories due to limitations of the halo finder. We discuss this step in @implementation. For the current investigation we disregard these halos and only consider well-behaved, fully reolved trees. @fig:alpha_evolution_vs_redshift shows how the fitted accretion rate evolves when starting from the different snapshots. We plot the mean and $1 sigma$ standard deviation of the resulting distribution of $alpha$ values. We consider different lookback lengths with the goal of assessing the stability of the fitting procedure.

-// comment on the high tail at high redshifts
-// explain how the dip at z=14 is probably more meaningful
-// comment on the contrast to $alpha = 0.79$ chosen before
+We observe a clear stabilization of the mean accretion for longer lookbacks. Not only does it make sense to consider longer lookbacks because of their causal connection, but also because it helps to absorb short-term fluctuations most likely introduced by the halo finder. This is especially noticeable in the first few snapshots where the $1 sigma$ uncertainty is significantly higher. This is likely due to the overabundance of low mass halos whose mass history is more erratic and harder to reconstruct, accentuated by displacements of the halos.

-Should explain why different lookbacks produce slightly offset estimates
-// Is this due to the fact that we discard cutoff trees (mostly due to _low_ masses)
+Numerically, the advantage of longer lookbacks is the stabilization of the fit leading to reduced scatter in the resulting distribution. We note that these behaviors stabilize once we consider around $n = 10$ snapshots of lookback. Both the mean and standard deviation follow a stable trend and the mean settles at $alpha approx 0.6$.
+We attribute the slight offset of the means to the fact that discarding incomplete trees favors more massive halos at higher lookbacks. These halos are more stable in terms of detection by the halo finder and are expected to have fewer fluctuations.

-We attribute the slight offset of the means to the fact that the discarding incomplete trees favors more massive halos at higher lookbacks. These halos are more stable in terms of detection by the halo finder and are expected to have lower fluctuations.
+Physically, the lookback time is motivated from the flux profiles of the halos themselves. Due to the size up to the $"Mpc"$ range (see e.g. @fig:profile_plot_alpha_dependence) we attribute to each profile a timescale that causally affects the region defined by that profile. For a profile of radius $ #sym.tilde.op 100 "Mpc"$ this time is on the order of $Delta t = 300 "Myr" #sym.arrow.l.r.double.long Delta z = 4$ (when looking back from a redshift of $z=8$). Given the spacing of snapshots in #thesan using $n=10$ snapshots still lies below the causal range. Since the fitted behavior seems to stabilize we suggest to not go beyond that since the consideration of additional snapshots slows down the simulation considerably.