I am trying to acquire the merger trees of specific subhalos I am analyzing at z=0. In more detail, I am trying to do analysis of certain host halos along with their satellite halos by taking their subhalo IDs at z=0 in Illustris-1 and then extracting their 3D positions and 3D velocities from their sublink progenitors throughout all the simulation. Instead of trying to download the terabits of data that my computer in no way can hold, I tried conducting analysis through the web api feature.
I am very happy to say that I am able to do the analysis that I want do with a central subhalo and their satellite using the sublink progenitor branches. However, I can only do it for one host subhalo and one satellite subhalo at a time. If I were to do this for the 1400 subhalo host along with their satellite halo (2800 subhalos progenitor trees in total), the server will definitely crash. Which is very unfortunate for my case. I can say that this is no surprise though since this is a lot of data to extract from the server.
I looked at the api cookbook and saw from task 11 that you are to define parameters that you wanting to download (such as position, velocity, halo mass) instead of every type of parameter in the snapshot. I was wondering if I am able to do this with the main progenitor branch for my subhalos with their already defined unique IDs (say hID for the host halo and sID for the satellite halo) as well as several parameters (params = {snap,id,mass_log_msun, pos_x,pos_y,pos_z, vel_x , vel_y, vel_z}) instead of all their parameters?
To put it better worded: is it possible to do an analogous method like task 11, where I can save all the sublink progenitor merger tree for each ID I want, as well as saving specific quantities (snap_number (or redshift), position, velocity, id, and mass_log_msun), all in one (or broken up) hdf5 file?
Another reason I asked this is because there is also an example that requests the main progenitor branch from the SubLink merger trees of one subhalo using the web api and saves it with the sublink_mpb. I do not really want to do that with each subhalo because that would be too many hdf5 files go through in my directory and to do analysis with.
I guess as a starting sample, we can start by extracting each progenitor branch for each host ID
for id in hID:
file_url = base_url + "snapshots/135/subhalos/" + str(id) + "/sublink/mpb.hdf5"
filename = get(file_url)
print filename
For each halo branch, it will just save an individual file for each each individual subhalo. How can I save all mpb into one (or split up) hdf5 file?
I do not think the parameter key acts the same at the cookbook sample, if I just want params = {'snap', 'id', 'pos_x', 'pos_y', 'pos_x', 'vel_x', 'vel_y', 'vel_x', 'mass_log_msun'}. If that is the case and I cannot use that for I am wanting, then having all the data for the progenitor branch of the halo does not matter.
Any help will be appreciated,
Alex
Dylan Nelson
6 Mar '18
Hi Alexandres,
No, we never implemented a "selective load (by fields)" for merger tree related data. The idea was that the trees of individual objects are already quiet small, particularly so for MPBs, so there wasn't much need.
I think 1400 or 2800 trees requested from the server will be fine, although I'm not sure if this will take an hour or a day to complete? Let me know.
If you do this, then yes the simplest thing would be, at the beginning, to save 1400 or 2800 individual HDF5 files. You could then write a script which looped over these fields, loaded just the fields you care about, and saved the results into one new master HDF5 file for further analysis.
Otherwise, you can of course download the whole SubLink tree (345 GB for Illustris-1).
Thank you for the reply. I went ahead with what you recommended and everything is working smoothly. Fortunately extracting the progenitor trees takes about an hour for ~1500 subhalos.
One issue, or really a question, I want to raise is that several of the subhalos at snapshot 135 are missing sublink trees. The ids in particular are
It appears to have no progenitor snapshot or id. When also clicking the sublink MPB hyperlink, there is no server that holds this file. I've tried looking through previous posts and found that this is quite rare (14 out of ~1400), but is this because that sublink found the region that this subhalo is too dense to be identified or is it an error that could possibly be resolved at your end? I also went about downloading the lhalotree MPB and found that these subhalos only have <5 snapshots they are associated with.
What are your thoughts on this?
I tried look
Dylan Nelson
25 Mar '18
This is correct, in the sense that SubLink wasn't able to create a tree for that object.
I would almost certainly not use such subhalos in your analysis, because this likely indicates that it is a spurious object (i.e. not really a cosmological subhalo), and could be a transient clump of particles or a fragmentation of particles within a galaxy. It would depend on your exact science question, whether to include these or not.
Hi Dylan.
I am trying to acquire the merger trees of specific subhalos I am analyzing at z=0. In more detail, I am trying to do analysis of certain host halos along with their satellite halos by taking their subhalo IDs at z=0 in Illustris-1 and then extracting their 3D positions and 3D velocities from their sublink progenitors throughout all the simulation. Instead of trying to download the terabits of data that my computer in no way can hold, I tried conducting analysis through the web api feature.
I am very happy to say that I am able to do the analysis that I want do with a central subhalo and their satellite using the sublink progenitor branches. However, I can only do it for one host subhalo and one satellite subhalo at a time. If I were to do this for the 1400 subhalo host along with their satellite halo (2800 subhalos progenitor trees in total), the server will definitely crash. Which is very unfortunate for my case. I can say that this is no surprise though since this is a lot of data to extract from the server.
I looked at the api cookbook and saw from task 11 that you are to define parameters that you wanting to download (such as position, velocity, halo mass) instead of every type of parameter in the snapshot. I was wondering if I am able to do this with the main progenitor branch for my subhalos with their already defined unique IDs (say
hIDfor the host halo andsIDfor the satellite halo) as well as several parameters (params = {snap,id,mass_log_msun, pos_x,pos_y,pos_z, vel_x , vel_y, vel_z}) instead of all their parameters?To put it better worded: is it possible to do an analogous method like task 11, where I can save all the sublink progenitor merger tree for each ID I want, as well as saving specific quantities (snap_number (or redshift), position, velocity, id, and mass_log_msun), all in one (or broken up) hdf5 file?
Another reason I asked this is because there is also an example that requests the main progenitor branch from the SubLink merger trees of one subhalo using the web api and saves it with the
sublink_mpb. I do not really want to do that with each subhalo because that would be too many hdf5 files go through in my directory and to do analysis with.I guess as a starting sample, we can start by extracting each progenitor branch for each host ID
For each halo branch, it will just save an individual file for each each individual subhalo. How can I save all
mpbinto one (or split up) hdf5 file?I do not think the parameter key acts the same at the cookbook sample, if I just want
params = {'snap', 'id', 'pos_x', 'pos_y', 'pos_x', 'vel_x', 'vel_y', 'vel_x', 'mass_log_msun'}. If that is the case and I cannot use that for I am wanting, then having all the data for the progenitor branch of the halo does not matter.Any help will be appreciated, Alex
Hi Alexandres,
No, we never implemented a "selective load (by fields)" for merger tree related data. The idea was that the trees of individual objects are already quiet small, particularly so for MPBs, so there wasn't much need.
I think 1400 or 2800 trees requested from the server will be fine, although I'm not sure if this will take an hour or a day to complete? Let me know.
If you do this, then yes the simplest thing would be, at the beginning, to save 1400 or 2800 individual HDF5 files. You could then write a script which looped over these fields, loaded just the fields you care about, and saved the results into one new master HDF5 file for further analysis.
Otherwise, you can of course download the whole SubLink tree (345 GB for Illustris-1).
Hi Dylan,
Thank you for the reply. I went ahead with what you recommended and everything is working smoothly. Fortunately extracting the progenitor trees takes about an hour for ~1500 subhalos.
One issue, or really a question, I want to raise is that several of the subhalos at snapshot 135 are missing sublink trees. The ids in particular are
[399780 413343 416446 425842 426542 426627 431553 432734 436128 440160 440183 441413 443459 443835]For one of the subhalos: http://www.illustris-project.org/api/Illustris-1/snapshots/135/subhalos/399780/
It appears to have no progenitor snapshot or id. When also clicking the sublink MPB hyperlink, there is no server that holds this file. I've tried looking through previous posts and found that this is quite rare (14 out of ~1400), but is this because that sublink found the region that this subhalo is too dense to be identified or is it an error that could possibly be resolved at your end? I also went about downloading the lhalotree MPB and found that these subhalos only have <5 snapshots they are associated with.
What are your thoughts on this?
I tried look
This is correct, in the sense that SubLink wasn't able to create a tree for that object.
I would almost certainly not use such subhalos in your analysis, because this likely indicates that it is a spurious object (i.e. not really a cosmological subhalo), and could be a transient clump of particles or a fragmentation of particles within a galaxy. It would depend on your exact science question, whether to include these or not.