Using API on TNG100-1-Dark

Sophia Nasr
  • 3
  • 7 Apr '20

UPDATE It seems changing the limit does something, but it doesn't quite allow the scan to go through (I get a timeout). Any ideas on how to fix this?


I did an analysis in the regular TNG100-1 run using the API. Now, I'm trying to do the same thing using TNG100-1-Dark, and am getting strange errors. For example,

ids = [subhalos['results'][i]['id'] for i in range(subhalos['count'])]

should work as it scans over all available subhalos in that snap. And it does in TNG100-1. But when I try to do the same for snapshot 84 in TNG100-1-Dark, which has over 5,000,000 subhalos, it keeps telling me the index is out of range, and will only allow me to scan up to 500 so that I have to use

ids = [subhalos['results'][i]['id'] for i in range(500)]

I tested random numbers, going down from 5,000,000 all the way to 500, which I then tried to increase to 501, and it said the index is out of range despite the fact that when I check how many subhalos there are in this snapshot using print(subhalos['count']), I get 5196947. Is there a limit set that I have to override when scanning the Dark matter only runs? I didn't have to set any limit in the full TNG100-1 run, so I was wondering if perhaps something is different about the Dark run, or maybe if there's a limit that's been placed on your end. Can we navigate this in a way so I can scan over all the subhalos in a snapshot?

Thank you!

Dylan Nelson
  • 8 Apr '20

Hi Sophia,

Any API request towards an endpoint like

will by default return 100 results. I suspect you have already changed these defaults then? Can you post a minimal working example of code which reproduces the issue?

Sophia Nasr
  • 4
  • 8 Apr '20

Hi Dylan,

Here's the code I use to search for specific subhalos in the snapshot:

limNumMassCriteria = 40000
limNum = 400000
mass_dm_min = 10**13 / 1e10 * 0.6774
mass_dm_max = 6*10**13 / 1e10 * 0.6774
# mass_stars_min = 8*10**10 / 1e10 * 0.6774
# mass_stars_max = 10**12 / 1e10 * 0.6774
redshift = 0.2

def querySubhalosDM(mass_dm_min, mass_dm_max, limNumMassCriteria, limNum, simRun, redshift, snap):
    def get(path, params=None):
        # make HTTP GET request to path
        r = requests.get(path, params=params, headers=headers)

        # raise exception if response code is not HTTP SUCCESS (200)

        if r.headers['content-type'] == 'application/json':
            return r.json()  # parse json responses automatically
        return r

    # form the search_query string by hand for once
    search_query = "?mass_dm__gt=" + str(mass_dm_min) + "&mass_dm__lt=" + str(mass_dm_max)

    # form the url and make the request
    url = "" + simRun + "/snapshots/z=" + str(redshift) + "/subhalos/" + search_query
    subhalos = get(url, {'limit': limNumMassCriteria})

    # return ids of halos falling in query criteria

    ids = [subhalos['results'][i]['id'] for i in range(40000)]
  #ids = [subhalos['results'][i]['id'] for i in range(subhalos['count'])]

    subs = get(snap['subhalos'], {'limit': limNum, 'order_by': 'id'})

    return subhalos, ids, subs'

It looks like it worked this time, but now it's searching for which of those subhalos are primary subhalos which is taking long, I suspect because it found many since my limit is 40,000??

Dylan Nelson
  • 9 Apr '20


To avoid ever producing an error, you should only loop over the actual number of responses.

So ids = [subhalos['results'][i]['id'] for i in range(len(subhalos['results'])].

I would suggest limit=5000 or so, so that each return is faster. Then, because the results are paginated, you need to walk through them.

subhalos = {'next':query_url}

while subhalos['next'] is not None: # or maybe, while subhalos['next'] != "":
    subhalos = get(...)
    for i in range(len(subhalos['results'])):
        # do something
    # note here subhalos['next'] is a new URL pointing to the next page, please see API documentation
Sophia Nasr
  • 10 Apr '20

Okay, thank you very much!

Sophia Nasr
  • 2
  • 20 Aug '20

Hi again Dylan,

I'm posting in here as I am revisiting the TNG100-1-DARK run, and using the above with the adjustment to "ids" as you suggested above, and changing limit to 5000, when I then try to use these results to find which of the ones my above query found to see which of them are primary subhalos in their host, I get back no subhalos, which makes no sense to me, at least one has to be a primary right?! And this is after taking about 2 hours, which is also different from when I use the hydrodynamical run (which was more like 30 min or even less). Here's the code I used after using the above with your suggestions:

def findPrimarySubhalos(subs, ids, subhalos, simRun, redshift):
    def get(path, params=None):
        # make HTTP GET request to path
        r = requests.get(path, params=params, headers=headers)

        # raise exception if response code is not HTTP SUCCESS (200)

        if r.headers['content-type'] == 'application/json':
            return r.json()  # parse json responses automatically
        return r

    # gets info on subhalos falling within criteria above
    sub = [get(subs['results'][ids[i]]['url'], {'limit': subhalos['count']}) for i in range(len(ids))]

    # finds which subhalos are primary subhalos of host
    id_ones = []
    for i in range(len(sub[:])):
        if sub[:][i]['primary_flag'] == 1:

    # gives [subhalo id, host halo id] list
    idNum = []
    grNr = []
    for id in id_ones:
        url = "" + simRun + "/snapshots/z=" + str(redshift) + "/subhalos/" + str(id)
        subhalo = get(url)
        id_grnr = list(zip(idNum, grNr))

    return id_grnr

I'm wondering if it's something going wrong with the initial search?? For example, with the initial code I showed you above, when I check the results in "subs", it's showing me subhalos with dm masses below 10^13, which already seems incorrect since my criteria should've cut that out. Any thoughts??

EDIT: the findings in "subs" seem to be similar for the hydrodynamical run, so that's not the problem...

Dylan Nelson
  • 20 Aug '20

Hello Sophia,

I would really suggest to just download the group catalog. Then you can load and immediately find the primary subhalos:

import illustris_python as il
basePath = 'TNG100-1-Dark/output/'

halos = il.groupcat.loadHalos(basePath, 99, fields=['GroupNsubs','GroupFirstSub'])

w = np.where(halos['GroupNsubs'] > 0)
sub_ids_primary = halos['GroupFirstSub'][w]

Or, if the group catalog is too large, you can download these two datasets separately:

wget --content-disposition --header "API-Key: HERE"
wget --content-disposition --header "API-Key: HERE"

with h5py.File('fof_subhalo_tab_099.Group.GroupFirstSub.hdf5','r') as f:
    GroupFirstSub = f['Group']['GroupFirstSub'][()]
with h5py.File('fof_subhalo_tab_099.Group.GroupNsubs.hdf5','r') as f:
    GroupNsubs = f['Group']['GroupNsubs'][()]

w = np.where(GroupNsubs > 0)
sub_ids_primary = GroupFirstSub[w]
Sophia Nasr
  • 20 Aug '20

Hi Dylan, thanks so much for the response. Is it absolutely mandatory to download the actual group catalog to do this, then?? I just had written code that takes me through the API and neatly saves all the subhalo IDs that matter so I can download their particle data... I suppose it wouldn't be too terrible to do it this way and then get the subhalo ids from the catalog anyway, though. But if it's possible to do it via API that would be much preferred due to file size. Thanks again!

Dylan Nelson
  • 21 Aug '20

Hi Sophia,

You can do it either way, the download approach is better if you are analyzing large numbers of objects (e.g., all subhalos in a box). The API is better if you're analyzing a small number of objects (e.g. by making a restrictive search

  • Page 1 of 1