Illustris - Trouble loading subhalo catalog with Scida

Taylor Curboy

21 Mar

Hello,

I am trying to load the subhalo and star catalogs using Scida and Dask in JupyterLab, but have recently ran into the value error, "operands could not be broadcast together with shapes (6291349,) (37053,)". This occurs when Scida tries to combine GroupFirstSub and LocalSubhaloID. I have only encountered this after setting up a Client as the Dask scheduler and setting a custom Dask configuration, so I'm wondering if that is interfering with the data loading somehow but I'm very unsure as to what is happening.

with da.config.set({
    "array.chunk-size": "8MB",
    "distributed.worker.memory.target": 0.6,
    "distributed.worker.memory.spill": 0.7
}):
    client = Client(n_workers=2, memory_limit='4.5GB')

    sim = load("TNG100-1")
    basepath = '../sims.TNG/TNG100-1/output/'
    snap = sim.get_dataset(redshift=0.0)
    current_snap = 99

    halodata = snap.data["Subhalo"]
    stars = snap.data["PartType4"]

I'd really appreciate any input on what may be happening.

Best wishes,
Taylor Curboy

Chris Byrohl

21 Mar

Dear Taylor,

thanks for the detailed report. This should not happen, and seems to be a bug. A workaround in the meantime time is to increase the chunksize to 128-256MB (in da.config.set). Could you confirm that is working? Broadly speaking, 8MB will be too small for efficient workload anyway, even though I understand TNGLab resources might constrain this. I can take a closer look in around a week.

Best,
Chris

Taylor Curboy

22 Mar

Dear Chris,

Thanks so much for the suggestion, increasing the chunk size did prevent this error.

Public Data Access Overview / Discussion Forum

Trouble loading subhalo catalog with Scida