Accessing the "fuzz"

Chris Miller
  • 3 Jun

I am trying to get all of the particles (gas, dm, stars) within a certain volume (e.g., a cube of a specified size centered on some specified location).

According to the example scripts, the following should work:
dm_pos = il.snapshot.loadSubset(basePath,99,'dm',['Coordinates'])
However, I am getting a memory error for TNG300.

I could try and find a way to use loadHalo around a list of FoF halos, but this must be missing "inner fuzz" and "outer fuzz" (http://www.tng-project.org/data/docs/specifications/ )

Can you suggest an algorithm to help? Is there a ways to load the "fuzz" separately?

Thanks,
Chris M

Dylan Nelson

Hi Chris,

For TNG300-1, this load command requires exactly 2500^3 * 3 * 8 bytes (the 3 from x,y,z, the 8 from float64 each). This is 350 GB, which is much more than I can allow on the JupyterLab interface, for instance.

If you have an analysis node with this much memory, and have downloaded the data there, then this will certainly work. I do indeed often work like this, as it will be the most efficient way.

But if you don't, there really isn't any need to load all the particles at the same time. You could e.g. load just 1% of the particles (requiring 3.5 GB) of memory, select those within the volume of interest, compute whatever you need (e.g. a radial profile?), then discard those particles, load the next 1%, and so on.

If the type of computation you have requires simultaneously all particles in the volume, then you could do it in two phases, first: locate all the particle indices in the volume with the chunked approached as above, save this index set, then load only the properties of those particles.

Yuanyuan Zhang
  • 20 Sep

Hi Chris and Dylan,

I am now running into the same issues with Chris, i.e., get all of the particles (gas, dm, stars) within a certain volume (e.g., a cube of a specified size centered on some specified location) for TNG300-1, for example.

Do you might have some example script that can do the "the chunked approached"?
I am trying the approach you suggested in another post "Loading the TNG100-1 data", but the "snapNumChunks" function is missing.

Thanks in advance.

Yuanyuan Zhang
  • 3
  • 20 Sep

I am trying to reuse a few functions from the illustris_python.snapshot to create a subset dictionary, modifying the corresponding values to chunk up the particle files, and pass to il.snapshot.loadSubset, seems to be working ...

import illustris_python as il
import numpy as np
from illustris_python.snapshot import getSnapOffsets, snapPath, getNumPart
from illustris_python.util import partTypeNum

basePath = '../sims.TNG/TNG100-3/output/'
subset = getSnapOffsets(basePath, 84, 0, "Subhalo")

with h5py.File(snapPath(basePath, 84), 'r') as f:
        header = dict(f['Header'].attrs.items())
        nPart = getNumPart(header)

startpoint=0
chunk_len=500000
sum0=0
while(startpoint < nPart[partTypeNum(partType)]):
     len_range=np.min([chunk_len, nPart[partTypeNum(partType)]-startpoint])
     subset['offsetType'][partTypeNum(partType)] = startpoint
     subset['lenType'][partTypeNum(partType)]    = len_range
     dm = il.snapshot.loadSubset(basePath, 84, 'dm', fields=['Velocities','Coordinates'], subset=subset)
     startpoint=startpoint+chunk_len
     print(len(dm['Coordinates'][:, 0]))
     sum0 = sum0+len(dm['Coordinates'][:, 0])
print(sum0)
Chris Miller
  • 1
  • 20 Sep

Hi Yuanyuan,
Below is a code snippet for my TNG volume search:

#My machine would need 350GB of memory (95 avail) to read in the entire set of HDF particles files at once. Instead, we read 1/10th of them at a time (thus the length). 
part_len = 15625000000.
loops = 10.0
bin_size = part_len/loops

#We are looking at a 5x5x5Mpc box around each location. We need to be real careful of when "h" lives in the units, else we get this wrong.
#For Illustris, they are pretty good at keeping everything in inverse h distance units.
vol_size = 5000.0

for i in range(0,int(loops)):
 start = int(i*bin_size)
 stop = int((i+1)*bin_size - 1)
 # print i, start, stop
 #we use the simple access mechaism to access the data (h5py). Usage is described here: http://www.tng-project.org/data/docs/specifications/
 #Note that particles for any given halo can live in any of the HDF snapshot files. In other words, halo #0 might have particles that live
 #in an HDF file that is not used for the first 156250000 particles and so we would only catch it on the second (or third...tenth) loop.
 #For now, we therefore save all of the particles for EACH HALO at A SPECIFIC LOOP over the particles. This means that we will need to
 #do a second round of data cleaning where we combine the particles from each loop for any specific halo back into one single particle file
 #for  that halo. Below, str(clusters[0][j]) puts the halo id in the filename while str(i) denotes the loop #.
 with h5py.File('simulation.hdf5','r') as f:
    dm_positions = f['/Snapshots/99/PartType1/Coordinates'][start:stop,:]
    for j in range(0,len(centers[0])):
        print j
        w = np.where((np.abs(dm_positions[:,0] - clusters[8][j][0]) < vol_size) & (np.abs(dm_positions[:,1] - clusters[8][j][1]) < vol_size) & (np.abs(dm_positions[:,2] - clusters[8][j][2]) < vol_size))[0]
        if (len(w) > 0):
          filename = front + '/particles/halo_' + str(clusters[0][j]) + '_positions_' + str(i) + '.fits'
          t = Table([dm_positions[w,0]/1000.0,dm_positions[w,1]/1000.0, dm_positions[w,2]/1000.0],  names=('px','py','pz'))
          t.write(filename, format='fits',overwrite=True)
 #We cannot read in both the velocities and the particles at the same time or else we run out of memory. We could just shrink the size of 
 #each chunk (i.e., increase the number of loops). Instead, i choose to write out the list of member tracers. I will use these later on the
 #velocities.
          filename = front + '/particles/halo_' + str(clusters[0][j]) + '_members_' + str(i) + '.fits'
          t = Table([w],names=('w'))
          t.write(filename, format='fits',overwrite=True)
 #I have to delete the variable before I even try to read in the next set, else we get memory issues.
    del dm_positions
#To get the velocities, I use the vector of saved members, which goes a lot faster.
Yuanyuan Zhang
  • 23 Sep

Thank you Chris!

  • Page 1 of 1