Getting Started

Reading a Merger Forest

Merger forests are stored in the HDF5 format, and although one could read the HDF5 file directly, the read_forest() function in haccytrees provides more convenience, e.g. by creating additional indices that make walking the tree much easier and by allowing to split the forest files into self-contained chunks.

For example, if we want to split a Last Journey merger forest into 10 chunks and read the first chunk:

import numpy as np
import haccytrees.mergertrees

forest, progenitor_array = haccytrees.mergertrees.read_forest(
    "/data/a/cpac/mbuehlmann/LastJourney/m000p.forest.000.hdf5",
    simulation="LastJourney",
    nchunks=10, chunknum=0)

The returned forest is a dictionary containing one-dimensional numpy arrays.

Extracting Main-Branch Matrices

Sometimes, only the main-branch (defined by following the most massive progenitor at each timestep) is needed. The function get_mainbranch_indices() is a convenient function to construct a matrix of shape (n_targets x n_steps), where each column corresponds to the main branch of a halo, and each row corresponds to an output step of the simulation. At times where a halo does not exist, the index is set to -1.

This index matrix can then be used to get the history of any stored halo parameter. As an example, we can easily extract the mass history of all halos in the mass-bin [1e13, 2e13] at z=0:

z0_mask = forest['snapnum'] == 100
mlim = [1e13, 2e13]
target_mask = z0_mask \
              & (forest['tree_node_mass'] > mlim[0]) \
              & (forest['tree_node_mass'] < mlim[1])
target_idx = forest['halo_index'][target_mask]

# this will create a matrix of shape (ntargets, nsteps), where each row
# is the main progenitor branch of a target. It contains the indices to the
# forest data, and is -1 if the halo does not exist at that time
mainbranch_index = haccytrees.mergertrees.get_mainbranch_indices(
    forest, simulation='LastJourney', target_index=target_idx
)

# Get the mass of the main branches
active_mask = mainbranch_index != -1
mainbranch_mass = np.zeros_like(mainbranch_index, dtype=np.float32)
mainbranch_mass[active_mask] = forest['tree_node_mass'][mainbranch_index[active_mask]]

Finding Major Mergers

Another common task is finding mergers above a certain relative or absolute threshold. To get the merger ratio from the two most massive progenitors of a list of target halos, the function get_nth_progenitor_indices() can be used as follows:

# get indices to main progenitors
main_progenitor_index = haccytrees.mergertrees.get_nth_progenitor_indices(
    forest, progenitor_array, target_index=mainbranch_index[active_mask], n=1
)

# get indices to secondary progenitors (main mergers)
main_merger_index = haccytrees.mergertrees.get_nth_progenitor_indices(
    forest, progenitor_array, target_index=mainbranch_index[active_mask], n=2
)

# the index will be negative if there's no merger, mask those out
merger_mask = main_merger_index >= 0

# allocate a merger_ratio matrix, 0 by default
merger_ratio = np.zeros_like(mainbranch_index, dtype=np.float32)

# fill the elements for which a merger occurred with the mass ratio
merger_ratio[tuple(np.argwhere(active_mask)[merger_mask].T)] = \
    forest['tree_node_mass'][main_merger_index[merger_mask]] /
    forest['tree_node_mass'][main_progenitor_index[merger_mask]]

Major mergers can then be identified by finding the entries in merger_ratio above the major merger threshold.

If an absolute major merger criteria is required, we only have to extract the mass of the main merger (secondary progenitor), i.e.

# get indices to secondary progenitors (main mergers)
main_merger_index = haccytrees.mergertrees.get_nth_progenitor_indices(
    forest, progenitor_array, target_index=mainbranch_index[active_mask], n=2
)

# the index will be negative if there's no merger, mask those out
merger_mask = main_merger_index >= 0

# allocate an array containing the merger masses, 0 by default
merger_mass = np.zeros_like(mainbranch_index, dtype=np.float32)

# fill the elements for which a merger occurred with the mass of the main merger
merger_mass[tuple(np.argwhere(active_mask)[merger_mask].T)] = \
    forest['tree_node_mass'][main_merger_index[merger_mask]]

Then, halos that in the last timestep underwent a major merger defined by an absolute mass threshold mass_threshold, can be selected by merger_mass >= mass_threshold.

In both cases, the scale factor of the last major merger can be found by finding the last column at which the merger ratio or merger mass is above the threshold, i.e.

simulation = haccytrees.Simulation.simulations['LastJourney']
scale_factors = simulation.step2a(np.array(simulation.cosmotools_steps))
last_snap = len(simulation.cosmotools_steps) - 1

# major merger mask with a relative threshold
mm_mask = merger_ratio > threshold

# major merger mask with an absolute threshold
mm_mask = merger_mass > threshold

# finding the last index
last_mm_index = last_snap - np.argmax(mm_mask[:, ::-1], axis=1)

last_mm_redshift = 1/scale_factors[last_mm_index] - 1

# mark all halos without any major merger with a last_mm_redshift of -1
last_mm_redshift[~np.any(mm_mask, axis=1)] = -1

Obtaining a Histogram of Infall Masses

The mass distribution of the halos that merge onto the main-progenitor branches of a halo (i.e. the infall masses) can be obtained with the function get_infall_histogram().

# target all halos at z=0 in [10**13.0, 10**13.05] mass range
mask = forest['snapnum'] == 100
mask &= forest['tree_node_mass'] > 10**13.0
mask &= forest['tree_node_mass'] < 10**13.05
target_index = np.nonzero(mask)[0]

# upper and lower masses for histogram in log units
m_low = 11
m_high = 13
nbins = 50

infall_hist = haccytrees.mergertrees.get_infall_histogram(
    fg_forest,
    target_index,
    10**m_low,
    10**m_high,
    nbins)

# calculate the bin centers
imass_edges = np.linspace(m_low, m_high, nbins+1, endpoint=True)
imass_centers = 0.5*(imass_edges[1:] + imass_edges[:-1])
imass_centers = 10**imass_centers

# plot the distribution
fig, ax = plt.subplots()
ax.step(imass_centers, infall_hist, where='center')

References

haccytrees.mergertrees.read_forest(filename, simulation, *, nchunks=None, chunknum=None, include_fields=None, create_indices=True, add_scale_factor=True, mass_threshold=None, include_non_z0_roots=False)[source]

Read a HDF5 merger-forest

Parameters:

filename (str) – the path to the merger forest file
simulation (str | Simulation) – either a valid simulation name or a Simulation instance, used to add the scale factor to the output
nchunks (int | None) – if not None, the file will be split in nchunks equally-sized parts
chunknum (int | None) – if nchunks is set, chunknum determines which chunk number will be read. First chunk has chunknum=0, has to be smaller than nchunks.
include_fields (List[str] | None) – the columns that will be read from the HDF5 file. If None, all data will be read. Note: some essential columns will always be read, check haccytrees.mergertrees.forest_reader._essential_fields.
create_indices (bool) – if True, will add descendant_idx``, progenitor_count, progenitor_offset to the forest and return the progenitor_array
add_scale_factor (bool) – if True, will add the scale factor column to the forest data
mass_threshold (float | None) – if not None, the reader will prune all halos below the specified mass threshold (in Msun/h)
include_non_z0_roots (bool) – if True, will also include trees that are not rooted at z=0 (i.e. halos that for some reason “disappear” during the evolution)

Returns:

forest (Mapping[str, np.ndarray]) – the merger tree forest data
progenitor_array (Optional[np.ndarray]) – a progenitor index array that can be used together with the progenitor_offset and progenitor_count arrays in the forest data in order to easily find all progenitors of a halo. Only returned if create_indices=True.

Return type:

Tuple[Mapping[str, ndarray], ndarray | None]

haccytrees.mergertrees.get_mainbranch_indices(forest, simulation, target_index)[source]

Extract main progenitor branches in a matrix format: (n_targets x n_steps)

Parameters:

forest (Mapping[str, ndarray]) – the full treenode forest returned by read_forest()
simulation (str | Simulation) – either a valid simulation string or an instance of Simulation, required to get the number of cosmotools steps.
target_index (ndarray) – the indices of the halos for which the main progenitor branch should be extracted.

Returns:

mainbranch_indices – the indices of the halos in the main progenitor branches. Each column j corresponds to the main branch of the j-th target_halo. Each row corresponds to a cosmotools step (with 0 being the first step). At times where the halo did not exist, the index is -1.

Return type:

np.ndarray

haccytrees.mergertrees.get_nth_progenitor_indices(forest, progenitor_array, target_index, n)[source]

Extract indices of the n-th most massive progenitor for each target halo

The index array returned has the same length as target_index. Invalid indices are masked with -1 (i.e. if the halo does not have n progenitors)

Parameters:

forest (Mapping[str, ndarray]) – the full treenode forest returned by read_forest()
progenitor_array (ndarray) – the full progenitor array created by read_forest()
target_index (ndarray) – the indices of the halos for which the merger indices should be extracted.
n (int) – the n-th progenitor. 0 corresponds to the main progenitor, 1 to the main merger halo, etc.

Returns:

merger_indices – the indices of the n-th most massive progenitors (determined by the tree-node mass). -1 if the progenitor does not exist.

Return type:

np.ndarray

haccytrees.mergertrees.get_infall_histogram(forest, target_index, mass_min, mass_max, nbins, logbins=True)[source]

Get a histogram of infall masses, integrated over all snapshots and target_index

This function counts the halos that fall onto the main progenitor branches of the halos specified by target_index and bins them according to their masses

Parameters:

forest (Mapping[str, ndarray]) – the full treenode forest returned by read_forest()
target_index (ndarray) – the indices of the halos for which the infall masses should be accumulated
mass_min (float) – lower mass bound for histogram
mass_max (float) – upper mass bound for histogram
nbins (int) – number of bins in the histogram, linearly or logarithmically spread from min to max
logbins (bool) – if True, the bins will be logarithmically distributed, otherwise linearly

Returns:

counts – the indices of the n-th most massive progenitors (determined by the tree-node mass). -1 if the progenitor does not exist.

Return type:

np.ndarray