Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Compute Mass Accretion and Star Formation Histories on Diffsky Catalogs

Data Diffsky Galaxies
Tasks Query Select Filter Add Columns Evaluate

The diffsky mock galaxy catalogs are available on NERSC to anyone with an account. There are two options for accessing the data:

  1. Access directly on NERSC via Jupyter or a job script

  2. Use the OpenCosmo Compute Portal to query a smaller subset of the data, and move the result to a computing resource of your choice. To request access, fill out this Google form

Feel free to use either (or both!) of these methods. In either case, the data will be stored in the OpenCosmo Format. You can use the OpenCosmo Python Toolkit to access this data and further query it. If working on NERSC, you can find the data at /global/cfs/cdirs/hacc/OpenCosmo/LastJourney/synthetic_galaxies/

Computing Mass Accretion and Star Formation Histories with Diffmah

This notebook demonstrates how to compute mass accretion and star formation histories for a population of synthetic galaxies, with data stored in the OpenCosmo Format.

Step 0: Install Requirements

In order to run this notebook, you will need to be using an environment with opencosmo, diffmah, and diffstar installed.

You can install opencosmo with:

conda create -n diffsky python=3.12 conda-forge::opencosmo

Once opencosmo is installed, install the diffsky packages with:

conda activate diffsky
opencosmo install diffsky --file /path/to/synthetic/catalog.hdf5

If you’re working with the original data on NERSC, the dataset will be split into a large number of individual files. You may use any of them for the step above. The file contains information about the versions of Diffmah and Diffstar that were used to produce the catalog. The command above will automatically install the correct versions into your environment, as well as additional dependencies like jax. See below for the paths.

Additional information about the OpenCosmo toolkit, including additional installation options, can be found on ReadTheDocs

Step 1: Selecting a Catalog and Loading the Data

The data folder contains several different catalogs, based on the dataset they are calibrated against. Catalogs are tagged with the date they were produced. For each callibration there is also a “latest” folder, which always points to the most recent version of the catalogs for a given calibration.

Now that we have the software we need, we can load the data using the OpenCosmo toolkit. We recommend passing all the files in the catalog to the toolkit. Don’t worry, it won’t actually read anything yet. We’ll restrict to a subset of the data later.

Note about Synthetic Cores

By default, every mock galaxy in Diffsky lives in a “subhalo core” that was resolved in the underlying N-body simulation (for more information about subhalo cores, see Rangel+20, https://arxiv.org/abs/2008.08519). Some Diffsky mocks include an additional population of faint galaxies that reside in halos below the mass resolution limit of the N-body simulation. These faint galaxies have spatially random values of ra and dec, but all of their other physical properties (including redshift and shear) are modeled in the same way as galaxies that live in simulated cores. It is necessary to include these galaxies in analyses that require magnitude-complete samples below the resolution limit of the simulation, but be aware that their inclusion comes with a factor-of-many increase in data volume, so you may wish to exclude them unless you know you need them.

Not all catalogs have synthetic cores, but if you are working with one that does you can include them by setting the synth_cores flag to True when you open the data, e.g.

import opencosmo as oc
oc.open("lc_cores-411.diffsky_gals.hdf5", synth_cores=True)
from pathlib import Path
import opencosmo as oc
import numpy as np
import matplotlib.pyplot as plt
# This tutorial assumes you are running on NERSC
all_catalogs_path = Path("/global/cfs/cdirs/hacc/OpenCosmo/LastJourney/synthetic_galaxies/")
available_catalogs = list(p.stem for p in all_catalogs_path.glob("*") if "latest" in p.stem)
print(available_catalogs)
# Select one of the catalogs, and open the data
data_path = all_catalogs_path / "galacticus_in_plus_ex_situ_latest"
files = list(f for f in data_path.glob("*.hdf5") if f.stem.startswith("lc_cores"))
ds = oc.open(*files)

Step 1:

If you are running on NERSC, the data are much too large to access all at once. We can cut it down a bit to make it more manageable. The cell below will select 50,000 galaxies at random between redshift 0.5 and 1.0

ds = ds.with_redshift_range(0.5, 1.0)
ds = ds.take(50_000)

The OpenCosmo dataset object does not actually contain any data yet, it just knows how to go get the data when you ask for it. However it does have lots of additional useful information, such as the cosmology of the simulation this catalog was produced from. We can preview the dataset by printing it

ds

Step 2: Get the Cosmology Information

Diffmah and Diffstar were produced from from halo catlogs from the LastJourney simulation. In order to produce the fits, we need to provide them with information about the cosmology this simulation was run with. We can easily access the astropy Cosmology object for the simulation with the cosmology attribute.

cosmology = ds.cosmology
cosmology

Step 3: Retrive the parameters for the Mass Accretion History fits

To compute the mass accretion histories, we need to retrieve the relavent columns from the catalog. By selecting these columns with ds.select, we ensure OpenCosmo doesn’t try to load a ton of unnecessary data into memory.

from diffmah import DEFAULT_MAH_PARAMS
# Get the names of the relevant columns
diffmah_columns = [key for key in DEFAULT_MAH_PARAMS._fields]
# Getting the data for those columns as numpy arrays 
diffmah_data = ds.select(diffmah_columns).get_data("numpy")
# Convert to the format Diffmah expects
diffmah_data = [diffmah_data[key] for key in diffmah_columns]
mah_params = DEFAULT_MAH_PARAMS._make(diffmah_data)

Step 4: Compute Mass Accretion Histories and make a plot

from diffmah import mah_halopop

t0 = cosmology.age(0).value
tarr = np.linspace(0.5, t0, 100)

dmdht, log_mah = mah_halopop(mah_params, tarr, np.log10(t0))

fig, ax = plt.subplots(1, 1)

for i in range(10):
    __=ax.plot(tarr, log_mah[i, :])

_ = ax.set_ylim(8,12.5)
ax.set_xlabel("Cosmic Time (Gyr)")
ax.set_ylabel(r"Maximum Halo Mass   $log(M_{halo}/M_{sun})$")

Step 5: Repeat Step 2 for Star Formation History Parameters

from diffstar import DEFAULT_DIFFSTAR_PARAMS
sfh_columns = DEFAULT_DIFFSTAR_PARAMS.ms_params._fields +  DEFAULT_DIFFSTAR_PARAMS.q_params._fields
diffstar_data = ds.select(sfh_columns).data

sfh_ms_data = [diffstar_data[key].value for key in DEFAULT_DIFFSTAR_PARAMS.ms_params._fields]
sfh_q_data = [diffstar_data[key].value for key in DEFAULT_DIFFSTAR_PARAMS.q_params._fields]

sfh_ms_params = DEFAULT_DIFFSTAR_PARAMS.ms_params._make(sfh_ms_data)
sfh_q_params = DEFAULT_DIFFSTAR_PARAMS.q_params._make(sfh_q_data)

sfh_params = DEFAULT_DIFFSTAR_PARAMS._make((sfh_ms_params, sfh_q_params))

Step 6: Compute SFH and make a plot

from diffstar import calc_sfh_galpop
fb_last_journey = cosmology.Ob0 / cosmology.Om0

sfh, smh = calc_sfh_galpop(sfh_params, mah_params, tarr, lgt0=np.log10(t0), fb=fb_last_journey, return_smh=True)

fig, ax = plt.subplots(1, 1)
yscale = ax.set_yscale('log')
yscale = ax.set_ylim(1e-5, 200)

for i in range(10):
    __=ax.plot(tarr, sfh[i, :])

ax.set_ylim(1e-3, 1e2)