Getting Started
Reading a Merger Forest
Merger forests are stored in the HDF5 format, and although one could read the HDF5
file directly, the read_forest()
function in haccytrees provides more
convenience, e.g. by creating additional indices that make walking the tree much
easier and by allowing to split the forest files into self-contained chunks.
For example, if we want to split a Last Journey merger forest into 10 chunks and read the first chunk:
import numpy as np
import haccytrees.mergertrees
forest, progenitor_array = haccytrees.mergertrees.read_forest(
"/data/a/cpac/mbuehlmann/LastJourney/m000p.forest.000.hdf5",
simulation="LastJourney",
nchunks=10, chunknum=0)
The returned forest
is a dictionary containing one-dimensional numpy arrays.
Extracting Main-Branch Matrices
Sometimes, only the main-branch (defined by following the most massive
progenitor at each timestep) is needed. The function get_mainbranch_indices()
is a convenient function to construct a matrix of shape (n_targets x n_steps),
where each column corresponds to the main branch of a halo, and each row
corresponds to an output step of the simulation. At times where a halo does not
exist, the index is set to -1.
This index matrix can then be used to get the history of any stored halo parameter. As an example, we can easily extract the mass history of all halos in the mass-bin [1e13, 2e13] at z=0:
z0_mask = forest['snapnum'] == 100
mlim = [1e13, 2e13]
target_mask = z0_mask \
& (forest['tree_node_mass'] > mlim[0]) \
& (forest['tree_node_mass'] < mlim[1])
target_idx = forest['halo_index'][target_mask]
# this will create a matrix of shape (ntargets, nsteps), where each row
# is the main progenitor branch of a target. It contains the indices to the
# forest data, and is -1 if the halo does not exist at that time
mainbranch_index = haccytrees.mergertrees.get_mainbranch_indices(
forest, simulation='LastJourney', target_index=target_idx
)
# Get the mass of the main branches
active_mask = mainbranch_index != -1
mainbranch_mass = np.zeros_like(mainbranch_index, dtype=np.float32)
mainbranch_mass[active_mask] = forest['tree_node_mass'][mainbranch_index[active_mask]]
Finding Major Mergers
Another common task is finding mergers above a certain relative or absolute
threshold. To get the merger ratio from the two most massive progenitors of a
list of target halos, the function get_nth_progenitor_indices()
can be
used as follows:
# get indices to main progenitors
main_progenitor_index = haccytrees.mergertrees.get_nth_progenitor_indices(
forest, progenitor_array, target_index=mainbranch_index[active_mask], n=1
)
# get indices to secondary progenitors (main mergers)
main_merger_index = haccytrees.mergertrees.get_nth_progenitor_indices(
forest, progenitor_array, target_index=mainbranch_index[active_mask], n=2
)
# the index will be negative if there's no merger, mask those out
merger_mask = main_merger_index >= 0
# allocate a merger_ratio matrix, 0 by default
merger_ratio = np.zeros_like(mainbranch_index, dtype=np.float32)
# fill the elements for which a merger occurred with the mass ratio
merger_ratio[tuple(np.argwhere(active_mask)[merger_mask].T)] = \
forest['tree_node_mass'][main_merger_index[merger_mask]] /
forest['tree_node_mass'][main_progenitor_index[merger_mask]]
Major mergers can then be identified by finding the entries in merger_ratio
above the major merger threshold.
If an absolute major merger criteria is required, we only have to extract the mass of the main merger (secondary progenitor), i.e.
# get indices to secondary progenitors (main mergers)
main_merger_index = haccytrees.mergertrees.get_nth_progenitor_indices(
forest, progenitor_array, target_index=mainbranch_index[active_mask], n=2
)
# the index will be negative if there's no merger, mask those out
merger_mask = main_merger_index >= 0
# allocate an array containing the merger masses, 0 by default
merger_mass = np.zeros_like(mainbranch_index, dtype=np.float32)
# fill the elements for which a merger occurred with the mass of the main merger
merger_mass[tuple(np.argwhere(active_mask)[merger_mask].T)] = \
forest['tree_node_mass'][main_merger_index[merger_mask]]
Then, halos that in the last timestep underwent a major merger defined by an
absolute mass threshold mass_threshold
, can be selected by merger_mass >=
mass_threshold
.
In both cases, the scale factor of the last major merger can be found by finding the last column at which the merger ratio or merger mass is above the threshold, i.e.
simulation = haccytrees.Simulation.simulations['LastJourney']
scale_factors = simulation.step2a(np.array(simulation.cosmotools_steps))
last_snap = len(simulation.cosmotools_steps) - 1
# major merger mask with a relative threshold
mm_mask = merger_ratio > threshold
# major merger mask with an absolute threshold
mm_mask = merger_mass > threshold
# finding the last index
last_mm_index = last_snap - np.argmax(mm_mask[:, ::-1], axis=1)
last_mm_redshift = 1/scale_factors[last_mm_index] - 1
# mark all halos without any major merger with a last_mm_redshift of -1
last_mm_redshift[~np.any(mm_mask, axis=1)] = -1
Obtaining a Histogram of Infall Masses
The mass distribution of the halos that merge onto the main-progenitor branches
of a halo (i.e. the infall masses) can be obtained with the function
get_infall_histogram()
.
# target all halos at z=0 in [10**13.0, 10**13.05] mass range
mask = forest['snapnum'] == 100
mask &= forest['tree_node_mass'] > 10**13.0
mask &= forest['tree_node_mass'] < 10**13.05
target_index = np.nonzero(mask)[0]
# upper and lower masses for histogram in log units
m_low = 11
m_high = 13
nbins = 50
infall_hist = haccytrees.mergertrees.get_infall_histogram(
fg_forest,
target_index,
10**m_low,
10**m_high,
nbins)
# calculate the bin centers
imass_edges = np.linspace(m_low, m_high, nbins+1, endpoint=True)
imass_centers = 0.5*(imass_edges[1:] + imass_edges[:-1])
imass_centers = 10**imass_centers
# plot the distribution
fig, ax = plt.subplots()
ax.step(imass_centers, infall_hist, where='center')
References
- haccytrees.mergertrees.read_forest(filename, simulation, *, nchunks=None, chunknum=None, include_fields=None, create_indices=True, add_scale_factor=True, mass_threshold=None, include_non_z0_roots=False)[source]
Read a HDF5 merger-forest
- Parameters:
filename (str) – the path to the merger forest file
simulation (str | Simulation) – either a valid simulation name or a Simulation instance, used to add the scale factor to the output
nchunks (int | None) – if not None, the file will be split in
nchunks
equally-sized partschunknum (int | None) – if
nchunks
is set,chunknum
determines which chunk number will be read. First chunk haschunknum=0
, has to be smaller thannchunks
.include_fields (List[str] | None) – the columns that will be read from the HDF5 file. If
None
, all data will be read. Note: some essential columns will always be read, checkhaccytrees.mergertrees.forest_reader._essential_fields
.create_indices (bool) – if
True
, will add descendant_idx``,progenitor_count
,progenitor_offset
to the forest and return theprogenitor_array
add_scale_factor (bool) – if
True
, will add the scale factor column to the forest datamass_threshold (float | None) – if not
None
, the reader will prune all halos below the specified mass threshold (in Msun/h)include_non_z0_roots (bool) – if True, will also include trees that are not rooted at z=0 (i.e. halos that for some reason “disappear” during the evolution)
- Returns:
forest (Mapping[str, np.ndarray]) – the merger tree forest data
progenitor_array (Optional[np.ndarray]) – a progenitor index array that can be used together with the
progenitor_offset
andprogenitor_count
arrays in the forest data in order to easily find all progenitors of a halo. Only returned ifcreate_indices=True
.
- Return type:
Tuple[Mapping[str, ndarray], ndarray | None]
- haccytrees.mergertrees.get_mainbranch_indices(forest, simulation, target_index)[source]
Extract main progenitor branches in a matrix format:
(n_targets x n_steps)
- Parameters:
forest (Mapping[str, ndarray]) – the full treenode forest returned by
read_forest()
simulation (str | Simulation) – either a valid simulation string or an instance of
Simulation
, required to get the number of cosmotools steps.target_index (ndarray) – the indices of the halos for which the main progenitor branch should be extracted.
- Returns:
mainbranch_indices – the indices of the halos in the main progenitor branches. Each column j corresponds to the main branch of the j-th target_halo. Each row corresponds to a cosmotools step (with 0 being the first step). At times where the halo did not exist, the index is -1.
- Return type:
np.ndarray
- haccytrees.mergertrees.get_nth_progenitor_indices(forest, progenitor_array, target_index, n)[source]
Extract indices of the n-th most massive progenitor for each target halo
The index array returned has the same length as target_index. Invalid indices are masked with -1 (i.e. if the halo does not have n progenitors)
- Parameters:
forest (Mapping[str, ndarray]) – the full treenode forest returned by
read_forest()
progenitor_array (ndarray) – the full progenitor array created by
read_forest()
target_index (ndarray) – the indices of the halos for which the merger indices should be extracted.
n (int) – the n-th progenitor.
0
corresponds to the main progenitor,1
to the main merger halo, etc.
- Returns:
merger_indices – the indices of the n-th most massive progenitors (determined by the tree-node mass). -1 if the progenitor does not exist.
- Return type:
np.ndarray
- haccytrees.mergertrees.get_infall_histogram(forest, target_index, mass_min, mass_max, nbins, logbins=True)[source]
Get a histogram of infall masses, integrated over all snapshots and target_index
This function counts the halos that fall onto the main progenitor branches of the halos specified by target_index and bins them according to their masses
- Parameters:
forest (Mapping[str, ndarray]) – the full treenode forest returned by
read_forest()
target_index (ndarray) – the indices of the halos for which the infall masses should be accumulated
mass_min (float) – lower mass bound for histogram
mass_max (float) – upper mass bound for histogram
nbins (int) – number of bins in the histogram, linearly or logarithmically spread from min to max
logbins (bool) – if
True
, the bins will be logarithmically distributed, otherwise linearly
- Returns:
counts – the indices of the n-th most massive progenitors (determined by the tree-node mass). -1 if the progenitor does not exist.
- Return type:
np.ndarray