match snapshot particles with their halo/subhalo

Flaminia Fortuni
  • 4 May '20

Hello!
Is there a way to connect star particles (in snapshot files) with the subhalo/halo they belong to? Something like an index or similar. I search for it but I didn't find anything.
Thank you very much.

Dylan Nelson
  • 6 May '20

Hi Flaminia,

In general, particles don't "know" who their parent subhalos or halos are. This is just to avoid storing more numbers. However, this can be reconstructed in reverse, since every subhalo or halo knows who its member particles are. My function for this (subhalos) looks something like this:

def inverseMapPartIndicesToSubhaloIDs(sP, indsType, ptName, debug=False, flagFuzz=True,
                                      SubhaloLenType, SnapOffsetsSubhalo):
    """ For a particle type ptName and snapshot indices for that type indsType, compute the
        subhalo ID to which each particle index belongs. 
        If flagFuzz is True (default), particles in FoF fuzz are marked as outside any subhalo,
        otherwise they are attributed to the closest (prior) subhalo.
    """
    gcLenType = SubhaloLenType[:,sP.ptNum(ptName)]
    gcOffsetsType = SnapOffsetsSubhalo[:,sP.ptNum(ptName)][:-1]

    # val gives the indices of gcOffsetsType such that, if each indsType was inserted
    # into gcOffsetsType just -before- its index, the order of gcOffsetsType is unchanged
    # note 1: (gcOffsetsType-1) so that the case of the particle index equaling the
    # subhalo offset (i.e. first particle) works correctly
    # note 2: np.ss()-1 to shift to the previous subhalo, since we want to know the
    # subhalo offset index -after- which the particle should be inserted
    val = np.searchsorted( gcOffsetsType - 1, indsType ) - 1
    val = val.astype('int32')

    # search and flag all matches where the indices exceed the length of the
    # subhalo they have been assigned to, e.g. either in fof fuzz, in subhalos with
    # no particles of this type, or not in any subhalo at the end of the file
    if flagFuzz:
        gcOffsetsMax = gcOffsetsType + gcLenType - 1
        ww = np.where( indsType > gcOffsetsMax[val] )[0]

        if len(ww):
            val[ww] = -1

    if debug:
        # for all inds we identified in subhalos, verify parents directly
        for i in range(len(indsType)):
            if val[i] < 0:
                continue
            assert indsType[i] >= gcOffsetsType[val[i]]
            if flagFuzz:
                assert indsType[i] < gcOffsetsType[val[i]]+gcLenType[val[i]]
                assert gcLenType[val[i]] != 0

    return val
Flaminia Fortuni
  • 8 May '20

Thank you, very useful. Obviously, I need the "Groupcat" files relative to the chosen snapshot, right?

Dylan Nelson
  • 8 May '20

Yes that's right

Flaminia Fortuni
  • 2
  • 13 May '20

ok, thanks! A couple of last trivial questions, I miss the meaning of "sP" , "ptName" and "SnapOffsetsSubhalo".
The first two, sP and ptName, look like some class object or class method (I say that becaus of this: sP.ptName(ptName)). I think ptName could be, e.g. for star particles, ptName = il.snapshot.partTypeNum('stars'), but what about sP, what should I pratically give in input in place of sP (maybe the specific snapshot I'm working at, e.g. 99 for z=0, or the index of a particle in that specific snapshot or whatever)?
And also, what is SnapOffsetsSubhalo?
thank you very much again.

Dylan Nelson
  • 13 May '20

Hi,

snapOffsetsSubhalo are the offsets into the snapshot, by subhalo, per type. (Subhalo/SnapByType in that documentation).

sP.ptNum(ptName) is just 4 for stars.

Flaminia Fortuni
  • 30 Sep '20

Hello again. I have another question for you on this topic. What if I extract particle datas not with the illustris_python library for python, but reading them directly with an HDF5 library in c++? I need c++ for a faster elaboration. Now I have this big dataset of star particles extracted from snapshot chunks at a certain snapshot. How do I connect these star particles extracted from the snapshot chunks with their subhaloID? I read on the data release paper that the data are organized according to their group/subgroup membership, but I don't know how to go on. Thank you for your help.

Dylan Nelson
  • 1 Oct '20

Hi Flaminia,

Yes you can use c++ to read hdf5 files, I suggest you follow the simplest example on the HDF5 C++ API.

Second, the function above was my example for how to connect particles to their parent (sub)halos. If you load a data array for all the particles of one type in a snapshot, then the index of that array gives you the group (or subgroup) that particle belongs to. Namely, if the index is between offset[i] and offset[i]+length[i] then it belongs to (sub)halo i, where "offset" and "length" are as described above.

Flaminia Fortuni
  • 5 Oct '20

Thank you for your answer, I will try to write something in c++ following your suggestions.
Just another comment, I read and count the star particles with hdf5 c++ API at a certain snapshot, but it seems that there are about 100 k particles more than the number of stars particles written in the header of the snapshot files. Is there a reason, maybe the exceeding particles are the "outer fuzz"? If it is the case, I can dismiss them maybe with the "inverseMapPartIndicesToSubhaloIDs"? Thank you.

Dylan Nelson
  • 5 Oct '20

Hi Flaminia,

The number of particles of a given type, summed across all file chunks, always equals the number in the header. If this isn't true, there may be a bug in your reading code.

Note there is one possible complication: the total number of particles in the header is saved in two separate attributes, you must compute the value as in snapshot.py for example.

Charles Walker
  • 13 Nov '20

Hello Dylan,

I have a question regarding the code you posted on 6th May.

Do I understand correctly that in the line:

    gcOffsetsMax = gcOffsetsType + gcLenType - 1
  • gcOffsetsType contains indexes which refer to the 'offsets' (number of gas cells) into the snapshot each subhalo resides. There is one number per subhalo.
  • gcLenType is the number of gas cells belonging to each subhalo in the snapshot. There is one number per subhalo.
  • gcOffsetsMax is therefore the second-to-last gas cell index into the snapshot belonging to a given subhalo (start+length-1). And there is one number per subhalo.

If so, how can you broadcast gcOffsetsType and gcLenType together, given that you earlier write:

gcOffsetsType = SnapOffsetsSubhalo[:,sP.ptNum(ptName)][:-1]

therfore reducing the size of gcOffsetsType by 1, making it equal (nsubhalos-1) in length and therefore unequal to the size of gcLenType?

Cheers,

Charlie

Dylan Nelson
  • 13 Nov '20

Hi Charlie,

Sorry I should have posted more complete code, in particular SubhaloLenType is exactly this field from the group catalogs. And SnapOffsetsSubhalo is the dataset "Subhalo/SnapByType" from the offsets file except that it has an extra entry on the end due to the way I compute this quickly, apologies for the confusion. After the SnapOffsetsSubhalo[:,sP.ptNum(ptName)][:-1] then this is identical to the dataset in the offsets file.

For TNG100-1 at z=0 all these have the same shape (4371211,).

Flaminia Fortuni
  • 3 Mar

Hi Dylan,
does it make sense to use inverseMapPartIndicesToSubhaloIDs function with PartType0, instead of PartType4? Thank you.

Dylan Nelson
  • 3 Mar

Yes it works the same for any particle type.

Flaminia Fortuni
  • 3
  • 10 May

Dear Dylan,
sorry if I come back to this question, but something is strange in my results.
I applied your algorithm to match particleID and SubhaloID (translated to C++, and checked with python). Then I tried to plot the comoving coordinates (x,y,z) of some of these subhalos, just to have a 3D visualization of the subhalos composed by star particles and gas particles with the same SubhaloID. Some of these subhalos, as the first plot, seem to be well arranged, with star particles in the center and surrounded by a gas cloud; the strange thing, that bother me, is that the disposition of some of these subhalos seem to not have a meaning, with the star particles totally apart from gas particles. Is there a physical/computational reason, or something is not working in the associating algorithm? Thank you.
SubhaloID=0 at snap=33 (red: star particles, violet: gas particles) : 3dSH0.png
SubhaloID=1 at snap=33 (red: star particles, violet: gas particles) : 3dSH1.png

Dylan Nelson
  • 10 May

Hello,

If you just load these particles normally:

stars_pos = il.snapshot.loadSubhalo(basePath,33,1,'stars',['Coordinates'])
gas_pos = il.snapshot.loadSubhalo(basePath,33,1,'gas',['Coordinates'])

Is the result the same?

Flaminia Fortuni
  • 10 May

Ok, I checked stars_pos and gas_pos against "my" stars and gas position, with Subhalo IDs extracted with your algorithm "transposed" in C++. The stellar component is identical, but the gas component is totally different. I can't actually find a good reason for this.

Flaminia Fortuni
  • 2
  • 11 May

Maybe the trouble is due to the fact that I'm using the index i as particleID instead of PartTypeX/ParticleIDs in place of what you call indsType in your inverseMapPartIndicesToSubhaloIDs function? Thank you again.

Dylan Nelson
  • 11 May

Hello Flaminia,

I am not sure what you mean by "using the index i as particleID". Perhaps you can write again what you are trying to accomplish? If you wish to know/load the positions of cells/particles which belong to a given subhalo, you can load them directly from the snapshot with il.snapshot.loadSubhalo(). There is no need to use this function inverseMapPartIndicesToSubhaloIDs unless you have a set of cell/particle indices and you need to know which subhalo they came from.

Flaminia Fortuni
  • 11 May

Hello,
sure, I try to explain what I'm trying to accomplish.
I'm writing a code in C++; in this code I read star and gas particles from the snapshot files, so in principle I don't know to which Subhalo these particles belong. After reading the particles, I want to assign to these particles their SubhaloIDs: to do so, I translated your inverseMapPartIndicesToSubhaloIDsin C++, so at any snapshot, I have not only the particleID, but also the SubhaloIDs to which these particles belong. This is useful for me because I want to derive some Subhalo properties on a particle-based analysis. I know that you have already written very useful and functioning routines in python, but I need to carry on my analysis in the way I have just described.

Than, by "using the index i as particleID" I mean the same thing written here, but for particles.

Thanks again.

Dylan Nelson
  • 11 May

I suspect the issue is that you are using "local" rather than "global" indices for indsType. These need to be the indices with respect to the entire snapshot, not within each file chunk (hdf5 file).

Because there are many more gas than stars, this is probably working because you are still in the first file for stars, but not in the first file for gas.

Flaminia Fortuni
  • 11 May

Ok, I solved the problem, now everything works correctly. The problem was that I used gcOffsetsType[4] and gcLenType[4] when matching particleID and subhaloID for gas particles. I had not noticed that these quantities exist for any type of particle. Thank you Dylan, your suggestions bring me to solve this issue.

  • Page 1 of 1