Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Resolve Python keyword conflict in Pydantic models #12819

Closed
wants to merge 2 commits into from
Closed

fix: Resolve Python keyword conflict in Pydantic models #12819

wants to merge 2 commits into from

Conversation

teleprint-me
Copy link

@teleprint-me teleprint-me commented Jul 12, 2023

  • Renamed 'IN' attribute in TokenPatternString and TokenPatternNumber models to 'IS_IN' to avoid conflict with Python's 'in' keyword.
  • Updated the corresponding Field alias to 'is_in' from 'in'.
  • This fix prevents a TypeError upon the initialization of these models due to the improper use of a reserved Python keyword as an attribute.

Description

This PR addresses an issue in which the use of the 'in' keyword as an attribute in the Pydantic models TokenPatternString and TokenPatternNumber resulted in a TypeError. To resolve this, 'IN' has been renamed to 'IS_IN', and the associated Field alias has been updated from 'in' to 'is_in'. As a result, these models can now be initialized without running into a conflict with Python's reserved keywords.

The changes were not tested due to issues with relative imports in the development environment.

Note that these changes may affect the loading of existing trained models or saved data that rely on the 'in' field. As such, proper migration strategies should be considered.

Types of change

This PR covers a bug fix.

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

This is a suggested fix for issue #12731.

- Renamed 'IN' attribute in TokenPatternString and TokenPatternNumber models
  to 'IS_IN' to avoid conflict with Python's 'in' keyword.
- Updated the corresponding Field alias to 'is_in' from 'in'.
- This fix prevents a TypeError upon the initialization of these models due
  to the improper use of a reserved Python keyword as an attribute.
- Revert the `IS_IN` property to `IN` to prevent regressions
- Keep the `is_in` alias to prevent conflicts with Python `in` keyword
@svlandeg svlandeg added the bug Bugs and behaviour differing from documentation label Jul 13, 2023
@svlandeg svlandeg linked an issue Jul 13, 2023 that may be closed by this pull request
@svlandeg svlandeg added the feat / cli Feature: Command-line interface label Jul 14, 2023
@adrianeboyd adrianeboyd removed the feat / cli Feature: Command-line interface label Jul 17, 2023
@adrianeboyd
Copy link
Contributor

adrianeboyd commented Jul 17, 2023

We definitely wouldn't rename IN in the patterns and I don't think that it is necessary to rename in in the alias.

The pydantic error related to ValueError: 'in' is not a valid parameter name with python 3.11 is fixed in pydantic v1.10, which is supported for spacy v3.4.2+ without any further changes to the schemas.

Can you double-check by installing spacy v3.4.2+ from scratch in a clean venv? I suspect that you might be running into problems because pydantic didn't get upgraded cleanly, which is a problem with pip sometimes.

Using spacy v3.4.2+ is the only working option for spacy with python 3.11. Alternatively, you can use an earlier version of python if you specifically need an earlier version of spacy.

@teleprint-me
Copy link
Author

teleprint-me commented Jul 17, 2023

@adrianeboyd

I appreciate the response and information.

Did you review the version information?

I use poetry, so handling versions is much simpler as a result.

The issue isn't resolved though. I've already validated it and reviewed the source and ran the tests. The alias parameter cannot be a Python keyword. This is a fact, not an opinion.

I will review, test, and verify once again when I get a chance to.

@adrianeboyd
Copy link
Contributor

Yes, I saw the version information, which is why my best guess is that a package install might be corrupted in your current environment, since the error ValueError: 'in' is not a valid parameter name should be fixed with pydantic v1.10+.

I can see how using a keyword as an alias causes limitations for the pydantic model in general, but I haven't been able to find an example where it causes problems for this usage for token pattern validation in spacy. I double-checked and extended the test cases to confirm: #12835

If you're running into a different error (you mention a TypeError above?) it might be something different than in the original issue. If that's the case, please open a new issue with the exact steps you're using to install spacy in a new venv, the output of spacy info and pip freeze, and the full error traceback.

@teleprint-me
Copy link
Author

teleprint-me commented Jul 20, 2023

@adrianeboyd

The issue is in the requirements.txt which pulls in the pydantic package without the fix.

If you do a fresh install with just pydantic, these are the results:

import pytest
from pydantic import BaseModel, Field


# Define the model with 'in' as an alias
class ModelWithIn(BaseModel):
    attribute: str = Field(..., alias="in")


# Define the model with 'is_in' as an alias
class ModelWithIsIn(BaseModel):
    attribute: str = Field(..., alias="is_in")


# Test case for the model with 'in' as an alias
def test_model_with_in():
    with pytest.raises(ValueError):
        ModelWithIn(**{"in": "value"})


# Test case for the model with 'is_in' as an alias
def test_model_with_is_in():
    try:
        ModelWithIsIn(**{"is_in": "value"})
    except ValueError:
        pytest.fail("Unexpected ValueError ..")


if __name__ == "__main__":
    pytest.main()
10:59:51 | ~/Documents/code/test_pydantic
(.venv)  λ pip list --format=columns              
Package           Version
----------------- -------
annotated-types   0.5.0
iniconfig         2.0.0
packaging         23.1
pip               23.1.2
pluggy            1.2.0
pydantic          2.0.3
pydantic_core     2.3.0
pytest            7.4.0
setuptools        68.0.0
typing_extensions 4.7.1
wheel             0.40.0
11:12:53 | ~/Documents/code/test_pydantic
(.venv)  λ pytest test.py
=============== test session starts ===============
platform linux -- Python 3.11.3, pytest-7.4.0, pluggy-1.0.0
rootdir: /home/austin/Documents/code/test_pydantic
plugins: anyio-3.7.0, typeguard-4.0.0, requests-mock-1.10.0
collected 2 items                                 

test.py F.                                  [100%]

==================== FAILURES =====================
_______________ test_model_with_in ________________

    def test_model_with_in():
>       with pytest.raises(ValueError):
E       Failed: DID NOT RAISE <class 'ValueError'>

test.py:17: Failed
============= short test summary info =============
FAILED test.py::test_model_with_in - Failed: DID NOT RAISE <class 'ValueError'>
=========== 1 failed, 1 passed in 0.08s ===========

This validates your assertion that issue was resolved with a particular version. However, after further investigation into the versioning, I was able to discover the pydantic version in the requirements.txt file for spacy.

The spacy library requires pydantic version >=1.7.4,!=1.8,!=1.8.1,<1.11.0 according to the requirements file. This means it needs a version of pydantic that is greater than or equal to 1.7.4 but not 1.8 or 1.8.1, and less than 1.11.0.

So, when users do a fresh pip install, it doesn't pull in the fix and they run into the issue I and @ipratham101 ran into.

As per your request for spacy info, no new information is acquired.

11:15:33 | ~/Documents/code/git/pygptprompt
(.venv) git:(main | Δ) λ spacy info               

============================== Info about spaCy ==============================

spaCy version    3.6.0                         
Location         /home/austin/Documents/code/git/pygptprompt/.venv/lib/python3.11/site-packages/spacy
Platform         Linux-6.4.3-arch1-2-x86_64-with-glibc2.37
Python version   3.11.3                        
Pipelines                                      

In either case, the use of keywords for aliases should be avoided regardless. Especially if there are plans to enforce the versioning.

In conclusion, the issue lies in the pydantic version that spacy is using. A possible solution would be for spacy to update the pydantic version requirement to include the version with the fix.

I'm eager to see any further feedback on this issue seeing as I'm relying on these packages as a dependency and would like to see the issue resolved.

@adrianeboyd
Copy link
Contributor

I'm sure that this problem is fixed in pydantic v1.10.x, so if you're still having trouble, this really points to an installation or environment problem. pip can definitely be frustrating here.

I can recommend:

In an existing venv, try to uninstall packages multiple times before installing a different version, especially if you've been using any editable installs:

python -m pip uninstall -y pydantic
python -m pip uninstall -y pydantic
python -m pip install 'pydantic~=1.10.0'

However, it's easiest to get things working in a brand new venv, without upgrading or downgrading any packages. Install all packages with one single install command like pip install -r requirements.txt command or whatever you use.

Skip your pip cache while installing in case a cached download is corrupted:

python -m pip install 'pydantic~=1.10.0' --no-cache-dir

If skipping the cache solves the problem, then purge your pip cache to delete any corrupted packages:

python -m pip cache purge

@teleprint-me
Copy link
Author

teleprint-me commented Jul 21, 2023

This holds true when multiple users encounter the same exact issue on different machines and operating systems.

It just feels like you're ignoring what's happening at this point which makes my efforts feel moot.

Walking me through a .venv step-by-step when I've clearly showcased competency in navigating a complex code base makes me question what actions you're taking in pursuing a resolution for this issue.

I've reported the details, the output, the logs, the tests, and even included a PR.

I did delete my '.venv' and restart from scratch. It did occur to me and I've clearly outlined how and why.

There's a version constraint and you have conflicting outstanding issues that need to be resolved.

Come up with a plan on how to tackle it. It requires time and planning via review and testing.

It would be less disingenuous if it didn't feel like you were attempting to kill me with kindness here.

"It works on my machine so you must be doing something wrong." is never a valid rationale.

If no steps are taken to resolve this, then I think users would find value in understanding what version they're constrained to so they understand how to avoid this particular issue.

If no solution will be accepted for this issue, then something should be done, such as a notice that states, "due to a ValueError occurring in Pydantic versions..." etcetera.

I think my disposition is fair and my frustration should be understandable.

@teleprint-me
Copy link
Author

pydantic/HISTORY.md#v1108-2023-05-23

The version required here is v1.10.8. So, if a project, say like chroma, requires anything lower, and I specify that in my pyproject.toml and run poetry install, then it will generate an error stating a package incompatibility.

Simply upgrading isn't necessarily a valid solution as a result.

@adrianeboyd
Copy link
Contributor

While your frustration related to managing python dependencies is understandable, the rest of this interaction is much less so. We have provided the version information you should need to sort this out for your environment and your project-specific requirements. Sorry, I'm afraid I don't see a productive way forward for this discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs and behaviour differing from documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

i am not able to download en modules like sm, md.
3 participants