-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enum-like classes cant be used as dictionary keys #960
Comments
Tracking pydantic/pydantic#9453 |
This is actually more serious that I initially gave it credit for. This pattern completely prevents users from using the import examples
from examples.fip_ophys_rig import rig
from aind_data_schema_models.modalities import Modality
rig.modalities = set([Modality.BEHAVIOR])
rig.model_dump() raises the following error: raceback (most recent call last):
File ".\aind-data-schema\deserialization_order.py", line 27, in <module>
rig.model_dump()
File ".\aind-data-schema\.venv\Lib\site-packages\pydantic\main.py", line 314, in model_dump
return self.__pydantic_serializer__.to_python(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: unhashable type: 'dict' |
It seems like an issue with the pydantic serializer. Even this causes issues: from typing import Set
from pydantic import BaseModel, ConfigDict
from dataclasses import dataclass
@dataclass(frozen=True)
class Cell:
value: str
# This works
set([Cell(value="a"), Cell(value="b")])
class WeightedSeeds(BaseModel):
seeds: Set[Cell]
foo = WeightedSeeds(seeds=set([Cell(value="a"), Cell(value="b")]))
# This raises unhashable type errors
foo.model_dump() |
Possible solutions 1- Simplify everything and go back to native 2- write our own de(serializers) for a base class to be inherited from all complex Enum types. This strategy has a few problems:
3- Keep a service somewhere that takes a string and returns these complex objects. This is potentially very interesting as it would allow the schema to be decoupled from the specifics of this database. For instance, the schema would only have to keep track of all possible manufacturers without having to keep track of abbreviations, numeric IDs and other properties that might otherwise change with time. |
@saskiad @dyf looping you in too. I think this is a rather annoying regression bug (since these started off as an Enum). Especially when other AIND tools like the watchdog could benefit greatly from this change. I will still look into the second solution but it just doesn't feel right. We should really just make all these Enums. It would also solve the weird pattern of typing the properties with Type.ONE_OF but using a different instance (Type.Value). |
Enums would be nice. I can't recall anymore exactly what the issue was that made me switch, but enums of objects were creating headaches somewhere. I do remember now that when we upgraded to v2, my original enum metadata class broke. I went with this solution. I'm open to suggestions for improvements since this has always been troublesome. I also remember now too, that most of the docs I was reading were using tagged unions for this kind of situation, but not quite the same situation. |
@jtyoung84 It's late and I might be missing something very obvious, but what about this? from aind_data_schema_models.modalities import Modality
from pydantic import BaseModel, Field
from enum import Enum
from typing import Dict, Set
class Modalities(Enum):
BEHAVIOR = Modality.BEHAVIOR
OPHYS = Modality.POPHYS
ICEPHYS = Modality.ICEPHYS
class SubSetOfModalities(Enum):
BEHAVIOR = Modality.BEHAVIOR
OPHYS = Modality.POPHYS
class Foo(BaseModel):
modality: Dict[Modalities, str] = Field(default={Modalities.BEHAVIOR: "foo"})
stuff: Set[Modalities] = Field(default={Modalities.BEHAVIOR, Modalities.OPHYS})
another_subset: SubSetOfModalities = Field(default=SubSetOfModalities.BEHAVIOR, validate_default=True)
foo = Foo()
print(foo)
print(foo.model_dump()) This is just a proof of concept, we could improve a bit the subtyping I think. I think it would be so much cleaner for users too! To be clear, I still think Enum of strings would be far easier to maintain. |
I have been playing with this a bit more, I think we can make this change be fully backward compatible with users code. The trick would be to define the subsets of Modalities as Literals (This is the python standard in most libraries anyway, so i think it is actually a good idea...) The previous example would thus become: from aind_data_schema_models.modalities import Modality
from pydantic import BaseModel, Field
from enum import Enum
from typing import Dict, Set, Literal
class Modalities(Enum):
BEHAVIOR = Modality.BEHAVIOR
OPHYS = Modality.POPHYS
ICEPHYS = Modality.ICEPHYS
class Foo(BaseModel):
modality: Dict[Modalities, str] = Field(default={Modalities.BEHAVIOR: "foo"})
stuff: Set[Modalities] = Field(default={Modalities.BEHAVIOR, Modalities.OPHYS})
another_subset: Literal[Modalities.BEHAVIOR, Modalities.OPHYS] = Field(..., validate_default=True)
foo = Foo(another_subset=Modalities.BEHAVIOR) # type hint: "another_subset: Literal[Modalities.BEHAVIOR, Modalities.OPHYS]"
print(foo)
print(foo.model_dump()) |
Ignore the previous two suggestions. I dont think we want to go down this route as both the json-schema and round-trip (de)serialization breaks. print(foo.model_validate_json(foo.model_dump_json())) leads to:
It seems pydantic cant automatically convert the json object instance to the corresponding python class. This could be done with an extra validator, but I still think we should go with Enum of type string to keep the solution simple. |
@bruno-f-cruz I'm going to assign this to @dbirman to see if this is a still an issue. |
Okay so revisiting this, I think the codegen -> enum classes is the solution here, yes? If so, let's close this issue and I'll look into getting that done this sprint. |
I dont think sgen alone will fix it since the pattern would still be the same. The problem rests with using classes as enum values unfort.
Here's some more material relevant to the discussion AllenNeuralDynamics/aind-data-schema-models#64 |
Okay so I take it your other PRs although they reference this issue are not in fact fixing this problem they are just allowing you to use Union[], but you still can't use a straight dictionary anywhere because pydantic classes can't be used as keys and they break model_dump (since it dumps to a dictionary). So the only real fix here would be to generate e.g. the abbreviations of the models as enums and store their data in a separate class? |
Not sure if there is a different intended use...
To reproduce:
The text was updated successfully, but these errors were encountered: