Getting Started 
===============

.. currentmodule:: haccytrees.mergertrees


.. _sec-reading-a-forest:

Reading a Merger Forest
-----------------------

Merger forests are stored in the HDF5 format, and although one could read the HDF5
file directly, the :meth:`read_forest` function in haccytrees provides more
convenience, e.g. by creating additional indices that make walking the tree much
easier and by allowing to split the forest files into self-contained chunks.

For example, if we want to split a Last Journey merger forest into 10 chunks and
read the first chunk:

.. code-block:: python

   import numpy as np
   import haccytrees.mergertrees

   forest, progenitor_array = haccytrees.mergertrees.read_forest(
       "/data/a/cpac/mbuehlmann/LastJourney/m000p.forest.000.hdf5", 
       simulation="LastJourney", 
       nchunks=10, chunknum=0)

The returned ``forest`` is a dictionary containing one-dimensional numpy arrays.


Extracting Main-Branch Matrices
-------------------------------

Sometimes, only the main-branch (defined by following the most massive
progenitor at each timestep) is needed. The function :func:`get_mainbranch_indices`
is a convenient function to construct a matrix of shape `(n_targets x n_steps)`,
where each column corresponds to the main branch of a halo, and each row 
corresponds to an output step of the simulation. At times where a halo does not
exist, the index is set to -1.

This index matrix can then be used to get the history of any stored halo
parameter. As an example, we can easily extract the mass history of all halos in
the mass-bin [1e13, 2e13] at z=0:

.. code-block:: python

    z0_mask = forest['snapnum'] == 100
    mlim = [1e13, 2e13]
    target_mask = z0_mask \
                  & (forest['tree_node_mass'] > mlim[0]) \
                  & (forest['tree_node_mass'] < mlim[1])
    target_idx = forest['halo_index'][target_mask]
        
    # this will create a matrix of shape (ntargets, nsteps), where each row 
    # is the main progenitor branch of a target. It contains the indices to the 
    # forest data, and is -1 if the halo does not exist at that time
    mainbranch_index = haccytrees.mergertrees.get_mainbranch_indices(
        forest, simulation='LastJourney', target_index=target_idx
    )
        
    # Get the mass of the main branches
    active_mask = mainbranch_index != -1
    mainbranch_mass = np.zeros_like(mainbranch_index, dtype=np.float32)
    mainbranch_mass[active_mask] = forest['tree_node_mass'][mainbranch_index[active_mask]]


Finding Major Mergers
---------------------

Another common task is finding mergers above a certain relative or absolute
threshold. To get the merger ratio from the two most massive progenitors of a
list of target halos, the function :func:`get_nth_progenitor_indices` can be
used as follows:

.. code-block:: python

   # get indices to main progenitors
   main_progenitor_index = haccytrees.mergertrees.get_nth_progenitor_indices(
       forest, progenitor_array, target_index=mainbranch_index[active_mask], n=1
   )

   # get indices to secondary progenitors (main mergers)
   main_merger_index = haccytrees.mergertrees.get_nth_progenitor_indices(
       forest, progenitor_array, target_index=mainbranch_index[active_mask], n=2
   )

   # the index will be negative if there's no merger, mask those out
   merger_mask = main_merger_index >= 0

   # allocate a merger_ratio matrix, 0 by default
   merger_ratio = np.zeros_like(mainbranch_index, dtype=np.float32)

   # fill the elements for which a merger occurred with the mass ratio
   merger_ratio[tuple(np.argwhere(active_mask)[merger_mask].T)] = \
       forest['tree_node_mass'][main_merger_index[merger_mask]] / 
       forest['tree_node_mass'][main_progenitor_index[merger_mask]]
   
   
Major mergers can then be identified by finding the entries in ``merger_ratio``
above the major merger threshold. 

If an absolute major merger criteria is required, we only have to extract the
mass of the main merger (secondary progenitor), i.e.

.. code-block:: python

   # get indices to secondary progenitors (main mergers)
   main_merger_index = haccytrees.mergertrees.get_nth_progenitor_indices(
       forest, progenitor_array, target_index=mainbranch_index[active_mask], n=2
   )

   # the index will be negative if there's no merger, mask those out
   merger_mask = main_merger_index >= 0

   # allocate an array containing the merger masses, 0 by default
   merger_mass = np.zeros_like(mainbranch_index, dtype=np.float32)

   # fill the elements for which a merger occurred with the mass of the main merger
   merger_mass[tuple(np.argwhere(active_mask)[merger_mask].T)] = \
       forest['tree_node_mass'][main_merger_index[merger_mask]] 

Then, halos that in the last timestep underwent a major merger defined by an
absolute mass threshold ``mass_threshold``, can be selected by ``merger_mass >=
mass_threshold``.

In both cases, the scale factor of the last major merger can be found by finding
the last column at which the merger ratio or merger mass is above the threshold,
i.e.

.. code-block:: python

   simulation = haccytrees.Simulation.simulations['LastJourney']
   scale_factors = simulation.step2a(np.array(simulation.cosmotools_steps))
   last_snap = len(simulation.cosmotools_steps) - 1

   # major merger mask with a relative threshold
   mm_mask = merger_ratio > threshold

   # major merger mask with an absolute threshold
   mm_mask = merger_mass > threshold

   # finding the last index
   last_mm_index = last_snap - np.argmax(mm_mask[:, ::-1], axis=1)

   last_mm_redshift = 1/scale_factors[last_mm_index] - 1
   
   # mark all halos without any major merger with a last_mm_redshift of -1
   last_mm_redshift[~np.any(mm_mask, axis=1)] = -1


Obtaining a Histogram of Infall Masses
--------------------------------------

The mass distribution of the halos that merge onto the main-progenitor branches
of a halo (i.e. the infall masses) can be obtained with the function 
:meth:`get_infall_histogram`.

.. code-block:: python

   # target all halos at z=0 in [10**13.0, 10**13.05] mass range
   mask = forest['snapnum'] == 100
   mask &= forest['tree_node_mass'] > 10**13.0
   mask &= forest['tree_node_mass'] < 10**13.05
   target_index = np.nonzero(mask)[0]
   
   # upper and lower masses for histogram in log units
   m_low = 11
   m_high = 13
   nbins = 50
   
   infall_hist = haccytrees.mergertrees.get_infall_histogram(
       fg_forest, 
       target_index, 
       10**m_low, 
       10**m_high, 
       nbins)

   # calculate the bin centers 
   imass_edges = np.linspace(m_low, m_high, nbins+1, endpoint=True)
   imass_centers = 0.5*(imass_edges[1:] + imass_edges[:-1])
   imass_centers = 10**imass_centers

   # plot the distribution
   fig, ax = plt.subplots()
   ax.step(imass_centers, infall_hist, where='center')


--------------------------------------------------------------------------------

References
----------

.. autofunction:: read_forest

.. autofunction:: get_mainbranch_indices

.. autofunction:: get_nth_progenitor_indices

.. autofunction:: get_infall_histogram