Accessing the "fuzz" (spatial query/cutout)

Chris Miller
  • 3 Jun '19

I am trying to get all of the particles (gas, dm, stars) within a certain volume (e.g., a cube of a specified size centered on some specified location).

According to the example scripts, the following should work:
dm_pos = il.snapshot.loadSubset(basePath,99,'dm',['Coordinates'])
However, I am getting a memory error for TNG300.

I could try and find a way to use loadHalo around a list of FoF halos, but this must be missing "inner fuzz" and "outer fuzz" (http://www.tng-project.org/data/docs/specifications/ )

Can you suggest an algorithm to help? Is there a ways to load the "fuzz" separately?

Thanks,
Chris M

Dylan Nelson
  • 2
  • 8 Jun '19

Hi Chris,

For TNG300-1, this load command requires exactly 2500^3 * 3 * 8 bytes (the 3 from x,y,z, the 8 from float64 each). This is 350 GB, which is much more than I can allow on the JupyterLab interface, for instance.

If you have an analysis node with this much memory, and have downloaded the data there, then this will certainly work. I do indeed often work like this, as it will be the most efficient way.

But if you don't, there really isn't any need to load all the particles at the same time. You could e.g. load just 1% of the particles (requiring 3.5 GB) of memory, select those within the volume of interest, compute whatever you need (e.g. a radial profile?), then discard those particles, load the next 1%, and so on.

If the type of computation you have requires simultaneously all particles in the volume, then you could do it in two phases, first: locate all the particle indices in the volume with the chunked approached as above, save this index set, then load only the properties of those particles.

Yuanyuan Zhang
  • 20 Sep '19

Hi Chris and Dylan,

I am now running into the same issues with Chris, i.e., get all of the particles (gas, dm, stars) within a certain volume (e.g., a cube of a specified size centered on some specified location) for TNG300-1, for example.

Do you might have some example script that can do the "the chunked approached"?
I am trying the approach you suggested in another post "Loading the TNG100-1 data", but the "snapNumChunks" function is missing.

Thanks in advance.

Yuanyuan Zhang
  • 3
  • 20 Sep '19

I am trying to reuse a few functions from the illustris_python.snapshot to create a subset dictionary, modifying the corresponding values to chunk up the particle files, and pass to il.snapshot.loadSubset, seems to be working ...

import illustris_python as il
import numpy as np
from illustris_python.snapshot import getSnapOffsets, snapPath, getNumPart
from illustris_python.util import partTypeNum

basePath = '../sims.TNG/TNG100-3/output/'
subset = getSnapOffsets(basePath, 84, 0, "Subhalo")

with h5py.File(snapPath(basePath, 84), 'r') as f:
        header = dict(f['Header'].attrs.items())
        nPart = getNumPart(header)

startpoint=0
chunk_len=500000
sum0=0
while(startpoint < nPart[partTypeNum(partType)]):
     len_range=np.min([chunk_len, nPart[partTypeNum(partType)]-startpoint])
     subset['offsetType'][partTypeNum(partType)] = startpoint
     subset['lenType'][partTypeNum(partType)]    = len_range
     dm = il.snapshot.loadSubset(basePath, 84, 'dm', fields=['Velocities','Coordinates'], subset=subset)
     startpoint=startpoint+chunk_len
     print(len(dm['Coordinates'][:, 0]))
     sum0 = sum0+len(dm['Coordinates'][:, 0])
print(sum0)
Chris Miller
  • 1
  • 20 Sep '19

Hi Yuanyuan,
Below is a code snippet for my TNG volume search:

#My machine would need 350GB of memory (95 avail) to read in the entire set of HDF particles files at once. Instead, we read 1/10th of them at a time (thus the length). 
part_len = 15625000000.
loops = 10.0
bin_size = part_len/loops

#We are looking at a 5x5x5Mpc box around each location. We need to be real careful of when "h" lives in the units, else we get this wrong.
#For Illustris, they are pretty good at keeping everything in inverse h distance units.
vol_size = 5000.0

for i in range(0,int(loops)):
 start = int(i*bin_size)
 stop = int((i+1)*bin_size - 1)
 # print i, start, stop
 #we use the simple access mechaism to access the data (h5py). Usage is described here: http://www.tng-project.org/data/docs/specifications/
 #Note that particles for any given halo can live in any of the HDF snapshot files. In other words, halo #0 might have particles that live
 #in an HDF file that is not used for the first 156250000 particles and so we would only catch it on the second (or third...tenth) loop.
 #For now, we therefore save all of the particles for EACH HALO at A SPECIFIC LOOP over the particles. This means that we will need to
 #do a second round of data cleaning where we combine the particles from each loop for any specific halo back into one single particle file
 #for  that halo. Below, str(clusters[0][j]) puts the halo id in the filename while str(i) denotes the loop #.
 with h5py.File('simulation.hdf5','r') as f:
    dm_positions = f['/Snapshots/99/PartType1/Coordinates'][start:stop,:]
    for j in range(0,len(centers[0])):
        print j
        w = np.where((np.abs(dm_positions[:,0] - clusters[8][j][0]) < vol_size) & (np.abs(dm_positions[:,1] - clusters[8][j][1]) < vol_size) & (np.abs(dm_positions[:,2] - clusters[8][j][2]) < vol_size))[0]
        if (len(w) > 0):
          filename = front + '/particles/halo_' + str(clusters[0][j]) + '_positions_' + str(i) + '.fits'
          t = Table([dm_positions[w,0]/1000.0,dm_positions[w,1]/1000.0, dm_positions[w,2]/1000.0],  names=('px','py','pz'))
          t.write(filename, format='fits',overwrite=True)
 #We cannot read in both the velocities and the particles at the same time or else we run out of memory. We could just shrink the size of 
 #each chunk (i.e., increase the number of loops). Instead, i choose to write out the list of member tracers. I will use these later on the
 #velocities.
          filename = front + '/particles/halo_' + str(clusters[0][j]) + '_members_' + str(i) + '.fits'
          t = Table([w],names=('w'))
          t.write(filename, format='fits',overwrite=True)
 #I have to delete the variable before I even try to read in the next set, else we get memory issues.
    del dm_positions
#To get the velocities, I use the vector of saved members, which goes a lot faster.
Yuanyuan Zhang
  • 23 Sep '19

Thank you Chris!

Benjamin Metha
  • 8 Mar '22

Hello Illustrians,

I am about to begin a project where I am assessing the rates of rare supernova events in the universe. These will probably happen in halos and subhalos, but this is not guaranteed.

From what I have read above, accessing the fuzz seems difficult, consuming time and energy. I don't think it will be necessary for my science -- but to scientifically justify this feeling, can you tell me the total mass/SFR of the inner and outer fuzz in TNG-50? How does this compare to the mass/SFR contained within halos and subhalos? Under the IllustrisTNG code, is star formation in the inner/outer fuzz expressly forbidden, or just very unlikely?

Thanks for your help,
Benji

Dylan Nelson
  • 9 Mar '22

Dear Benji,

The "outer fuzz" is material outside of all halos, so you could also call this (more or less) "IGM".

Star formation in the IGM isn't technically forbidden, but I don't expect it ever occurs in practice. This is because star formation requires a gas density higher than 0.1 hydrogen atoms / cm^3, and gas in the IGM does not (more or less) achieve such high densities.

There is, however, lots of mass in the IGM. At z=0, the rule of thumb is that: half of all particles (gas and DM) are actually outside of halos. (There are very few stars outside of halos).

Ainhoa Zubiaur Arsuaga
  • 1
  • 15 Jul

Hi,
I'm having a problem related to choosing a subset of the simulation. I tried the scripts proposed in this thread as examples but I get errors and I don't really know why. The error regarding Yuanyuan's code is

"FileNotFoundError: [Errno 2] Unable to synchronously open file (unable to open file: name = '../sims.TNG/TNG100-3/output//../postprocessing/offsets/offsets_084.hdf5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)"
and the error regarding Chris' script is
"FileNotFoundError: [Errno 2] Unable to synchronously open file (unable to open file: name = 'simulation.hdf5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)",

which is really similar.

I guess that there is something wrong with the path I'm choosing but I don't know how to fix it.

So, how can I chose a subset of a simulation?

Thanks in advance :)

Dylan Nelson
  • 16 Jul

The code snippets here may provide some inspiration, but you will have to adapt them to your needs, and make them work.

The errors you have above, are just path errors, they are not related.

  • Page 1 of 1