Trouble loading subhalo catalog with Scida

Taylor Curboy
  • 21 Mar

Hello,

I am trying to load the subhalo and star catalogs using Scida and Dask in JupyterLab, but have recently ran into the value error, "operands could not be broadcast together with shapes (6291349,) (37053,)". This occurs when Scida tries to combine GroupFirstSub and LocalSubhaloID. I have only encountered this after setting up a Client as the Dask scheduler and setting a custom Dask configuration, so I'm wondering if that is interfering with the data loading somehow but I'm very unsure as to what is happening.

with da.config.set({
    "array.chunk-size": "8MB",
    "distributed.worker.memory.target": 0.6,
    "distributed.worker.memory.spill": 0.7
}):
    client = Client(n_workers=2, memory_limit='4.5GB')

    sim = load("TNG100-1")
    basepath = '../sims.TNG/TNG100-1/output/'
    snap = sim.get_dataset(redshift=0.0)
    current_snap = 99

    halodata = snap.data["Subhalo"]
    stars = snap.data["PartType4"]

I'd really appreciate any input on what may be happening.

Best wishes,
Taylor Curboy

Chris Byrohl
  • 21 Mar

Dear Taylor,

thanks for the detailed report. This should not happen, and seems to be a bug. A workaround in the meantime time is to increase the chunksize to 128-256MB (in da.config.set). Could you confirm that is working? Broadly speaking, 8MB will be too small for efficient workload anyway, even though I understand TNGLab resources might constrain this. I can take a closer look in around a week.

Best,
Chris

Taylor Curboy
  • 22 Mar

Dear Chris,

Thanks so much for the suggestion, increasing the chunk size did prevent this error.

  • Page 1 of 1