Parallel read option for GCR #621

patricialarsen · 2023-02-17T23:50:04Z

Adding parallel read using the config overwrite functionality and normal GCR.

An easy way to test this:
On a jupyter environment open up a terminal and run

source /global/common/software/lsst/common/miniconda/setup_current_python.sh

and then test using
mpirun -np 1 python gcr_test.py
with different numbers of processes, where the test code is

import sys
sys.path.insert(0,path_to_gcrcatalogs)
import GCRCatalogs
from mpi4py import MPI
import numpy as np
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

catalog = 'cosmoDC2_v1.1.4_small'
quantities=['ra']

rank_cat = GCRCatalogs.load_catalog(catalog, config_overwrite={'mpi_rank': rank, 'mpi_size': size})
data_rank = rank_cat.get_quantities(quantities)

print("on rank ", rank," of ", size, " we have ", len(data_rank['ra']), " galaxies in ", catalog )

def send_to_master(value):
    count = len(value)
    tot_num = comm.reduce(count)
    counts = comm.allgather(count)
    if rank==0:
        recvbuf = np.zeros(tot_num).astype(np.float64)
    else:
        recvbuf = None
    displs = np.array([sum(counts[:p]) for p in range(size)])
    comm.Gatherv([value,MPI.DOUBLE], [recvbuf,counts,displs,MPI.DOUBLE],root=0)
    return recvbuf

data_master={}
for quantity in quantities:
    data_master[quantity] = send_to_master(data_rank[quantity])

if rank==0:
    print("master rank has ", len(data_master['ra']), " galaxies in ", catalog )

Parallel

Merge branch 'master' of https://github.com/patricialarsen/gcr-catalogs

patricialarsen · 2023-02-17T23:56:15Z

There are a few changes I need to

firstly to fix the permissions on the files
secondly to merge updates from the main branch

Some things to note:

If you try to load a catalog in parallel which does not support the functionality, it gives a Runtime error.
The catalogs I've updated to include this are basically everything that allows native filters, with a handful of exceptions in which I don't adequately understand the underlying data structure or the reader enough to safely do this yet.

yymao · 2023-02-20T01:28:38Z

Thank you @patricialarsen! Sorry, I only had a bit of time to skim over it, and I'll take a closer look soon.

I do have a question about the permissions. Can you explain the issue a bit more. It's ok if we have to change the permissions, but I worried that people may not know this when creating new configs. I wonder if there's some alternatives that are more future-proof?

patricialarsen · 2023-02-20T19:47:22Z

We don't need to change the permissions, it's just that for sprint-week events I've had people link to my local repository to access the reader and altered the permissions in doing so, so I need to reset these back to the default.

patricialarsen · 2023-02-20T20:04:46Z

As a side note I believe we can make the permissions settings more general by using the core.fileMode and core.sharedRepository config settings, but am not entirely sure how these work. I believe setting the first of these to false stops git from tracking the permission changes in the repository should allow me to make local permissions changes without it causing these problems

…t converted

…g in dc2_object

patricialarsen · 2023-02-21T01:45:50Z

You should also note that this pull request adds readers for the DP0.2 object catalogs

yymao · 2023-03-06T03:19:31Z

@patricialarsen thanks for updating the PR and sorry for the delay. Looking at the changes to the readers, I wonder if it's worthy creating a new base class, say BaseMPIGenericCatalog.

class BaseMPIGenericCatalog(BaseGenericCatalog):
    def __init__(self, **kwargs):
        self._rank = int(kwargs.pop('mpi_rank'))
        self._size = int(kwargs.pop('mpi_size'))
        super().__init__(**kwargs)

This way, it's more clear which readers support MPI (and in those cases, mpi_rank and mpi_size are required). For readers that doesn't support MPI (still uses BaseGenericCatalog), any additional kwargs are just ignored (which is more consistent with current behavior).

Thoughts?

P and others added 12 commits March 28, 2022 12:41

added parallel option for cosmodc2 and dc2_object catalogs

d559a1a

Merge branch 'LSSTDESC:master' into parallel

6ad32a7

Merge branch 'LSSTDESC:master' into parallel

9832bc3

Merge pull request #1 from patricialarsen/parallel

8b480b5

Parallel

adding missing yaml file

1edebed

not sure why this is necessary

4d0c4b4

Merge branch 'master' of https://github.com/patricialarsen/gcr-catalogs

added __init__ changes

b1dc858

updated dc2_dm_catalog

16356d5

Added DP02 catalog config files

1023c68

working commit of composite catalog reader

770a61c

added LSST object and permissions

4c97399

altered parallel read method to merge

d64731b

patricialarsen requested a review from yymao February 18, 2023 00:03

fix file permissions, from 100755 to 100644

43f43bd

patricialarsen mentioned this pull request Feb 20, 2023

MPI compatibility yymao/generic-catalog-reader#38

Closed

P added 8 commits February 20, 2023 12:34

further permissions fixes, a few files with spaces in the name weren'…

2bf11d8

…t converted

merge with upstream changes from LSSTDESC/gcr-catalogs

128d379

tidying - deleted obsolete dp02_object file, removed unused catalog name

6212ee1

simplification of lsst_object catalog definition

0d0e210

lsst_object.py improvements

37e7ed1

tidying and checking of dp0.2 reader

c88b8b9

replaced _flux_to_mag with convert_nanoJansky_to_mag for truth catalo…

5594c52

…g in dc2_object

minor typo fixes

88a3e36

P added 2 commits February 20, 2023 18:17

added test versions of catalogs with one tract only

3036d73

minor bug fixes

d8de6f3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel read option for GCR #621

Parallel read option for GCR #621

patricialarsen commented Feb 17, 2023 •

edited

Loading

patricialarsen commented Feb 17, 2023 •

edited

Loading

yymao commented Feb 20, 2023

patricialarsen commented Feb 20, 2023

patricialarsen commented Feb 20, 2023 •

edited

Loading

patricialarsen commented Feb 21, 2023

yymao commented Mar 6, 2023

Parallel read option for GCR #621

Are you sure you want to change the base?

Parallel read option for GCR #621

Conversation

patricialarsen commented Feb 17, 2023 • edited Loading

patricialarsen commented Feb 17, 2023 • edited Loading

yymao commented Feb 20, 2023

patricialarsen commented Feb 20, 2023

patricialarsen commented Feb 20, 2023 • edited Loading

patricialarsen commented Feb 21, 2023

yymao commented Mar 6, 2023

patricialarsen commented Feb 17, 2023 •

edited

Loading

patricialarsen commented Feb 17, 2023 •

edited

Loading

patricialarsen commented Feb 20, 2023 •

edited

Loading