Quick Way To Get Info On Nearby Halos?

Harry Chittenden
  • 1
  • 9 Nov '20


I am using the Jupyter workspace, and am trying to extract information about subhalos whose centres of mass are within a certain distance of the centre of mass of a target subhalo. I am trying to do this for a specific list of subhalos in TNG100-1 and in TNG300-1, at multiple snapshots.
This is painfully slow. I left this running over the weekend and have only retrieved information for one snapshot in TNG100-1, and I'm pretty sure that running this for TNG300-1 causes the kernel to become unresponsive. For part of my code, see below.
Is there a more efficient way to recursively pick out all the subhalos in multiple local volumes as I'm doing? Would appreciate any useful advice.


reqfields = ['SubhaloMassType', 'SubhaloCM']
sh1=il.groupcat.loadSubhalos('sims.TNG/TNG100-1/output/', 99, fields=reqfields)
sh3=il.groupcat.loadSubhalos('sims.TNG/TNG300-1/output/', 99, fields=reqfields)

CM100 = sh1['SubhaloCM'] / h
CM300 = sh3['SubhaloCM'] / h
MH100 = sh1['SubhaloMassType'][:,1] * 1e10 / 0.6774
MH300 = sh3['SubhaloMassType'][:,1] * 1e10 / 0.6774

ids100 = np.genfromtxt('ids100.txt')
ids300 = np.genfromtxt('ids300.txt')

env100 = []
for i in ids100:
    obj = il.groupcat.loadSingle('sims.TNG/TNG100-1/output/', 99, subhaloID=i)
    cm = obj['SubhaloCM'] / h
    D = np.absolute(CM100 - cm)
    Dm = np.max(D, axis=1)
    R = np.sqrt(np.sum((CM100-cm)**2, axis=1))
    w = np.where((Dm>0) & (Dm<1000) & (R>0))[0]
    if len(w) > 0:
        MHw = MH100[w]
Dylan Nelson
  • 10 Nov '20

Hi Harry,

There are two issues here.

First, a good rule is "never load anything from disk, once per halo". So you need to remove the il.groupcat.loadSingle call from the loop. You can replace this with il.groupcat.loadSubhalos(..., fields=['SubhaloCM']) before the loop starts, and then just access the correct index of this array in the loop.

Second, your algorithm is a nearest neighbor search, and under the techniques listed there you are doing a "linear search". This has a poor complexity of O(N^2), which can be improved using one of the "space partioning" ideas described there. (In practice, e.g. scipy.cKDTree). If the size of ids is greater than ~1 million to ~1 billion, such an improvement might also be needed.

Harry Chittenden
  • 10 Nov '20

Thanks Dylan,

First of all, I tried using the loadSubhalos function but it doesn't seem to have an ID field, hence why I resorted to loadSingle. Maybe I'm being stupid, but how do you raise this field, assuming it's possible?

The cKDTree thing looks very useful though - I'll give that a try.

Dylan Nelson
  • 10 Nov '20

Hi Harry,

The loadSubhalos() function loads fields for all subhalos, so you don't pass an ID. You have already loaded the data actually, all you need to do is replace

cm = obj['SubhaloCM'] / h


cm = CM100[i,:]
Harry Chittenden
  • 10 Nov '20

Excellent, thanks for your help!

  • Page 1 of 1