Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Please make trove classifiers validation optional #4459

Open
eli-schwartz opened this issue Jul 5, 2024 · 11 comments · May be fixed by #4611
Open

[BUG] Please make trove classifiers validation optional #4459

eli-schwartz opened this issue Jul 5, 2024 · 11 comments · May be fixed by #4611

Comments

@eli-schwartz
Copy link
Contributor

eli-schwartz commented Jul 5, 2024

setuptools version

from git, also 70.0.0

Python version

Python 3.12

OS

Gentoo Linux

Additional environment information

No response

Description

I tried building a wheel for a package that uses a trove classifier in the very latest version of the trove_classifiers package. Setuptools detected that trove_classifiers was installed, and insisted on validating it.

Looking at the code which handles this, it offers an escape hatch only to "avoid hitting the network" if trove_classifiers is NOT installed:

class _TroveClassifier:
"""The ``trove_classifiers`` package is the official way of validating classifiers,
however this package might not be always available.
As a workaround we can still download a list from PyPI.
We also don't want to be over strict about it, so simply skipping silently is an
option (classifiers will be validated anyway during the upload to PyPI).
"""
downloaded: typing.Union[None, "Literal[False]", typing.Set[str]]
def __init__(self) -> None:
self.downloaded = None
self._skip_download = False
# None => not cached yet
# False => cache not available
self.__name__ = "trove_classifier" # Emulate a public function
def _disable_download(self) -> None:
# This is a private API. Only setuptools has the consent of using it.
self._skip_download = True
def __call__(self, value: str) -> bool:
if self.downloaded is False or self._skip_download is True:
return True
if os.getenv("NO_NETWORK") or os.getenv("VALIDATE_PYPROJECT_NO_NETWORK"):
self.downloaded = False
msg = (
"Install ``trove-classifiers`` to ensure proper validation. "
"Skipping download of classifiers list from PyPI (NO_NETWORK)."
)
_logger.debug(msg)
return True
if self.downloaded is None:
msg = (
"Install ``trove-classifiers`` to ensure proper validation. "
"Meanwhile a list of classifiers will be downloaded from PyPI."
)
_logger.debug(msg)
try:
self.downloaded = set(_download_classifiers().splitlines())
except Exception:
self.downloaded = False
_logger.debug("Problem with download, skipping validation")
return True
return value in self.downloaded or value.lower().startswith("private ::")
try:
from trove_classifiers import classifiers as _trove_classifiers
def trove_classifier(value: str) -> bool:
"""See https://pypi.org/classifiers/"""
return value in _trove_classifiers or value.lower().startswith("private ::")
except ImportError: # pragma: no cover
trove_classifier = _TroveClassifier()

Expected behavior

A way to disable the unwanted functionality without first uninstalling other packages. For example, instead of $VALIDATE_PYPROJECT_NO_NETWORK, I would like to see $VALIDATE_PYPROJECT_NO_TROVE_CLASSIFIERS.

How to Reproduce

  1. git clone https://github.com/urschrei/pyzotero
  2. python -m build -nwx

Output

* Building wheel...
configuration error: `project.classifiers[10]` must be trove-classifier
DESCRIPTION:
    `PyPI classifier <https://pypi.org/classifiers/>`_.

GIVEN VALUE:
    "License :: OSI Approved :: Blue Oak Model License (BlueOak-1.0.0)"

OFFENDING RULE: 'format'

DEFINITION:
    {
        "type": "string",
        "format": "trove-classifier"
    }

For more details about `format` see
https://validate-pyproject.readthedocs.io/en/latest/api/validate_pyproject.formats.html

Traceback (most recent call last):
  File "/usr/lib/python3.12/site-packages/pyproject_hooks/_in_process/_in_process.py", line 373, in <module>
    main()
  File "/usr/lib/python3.12/site-packages/pyproject_hooks/_in_process/_in_process.py", line 357, in main
    json_out["return_val"] = hook(**hook_input["kwargs"])
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/pyproject_hooks/_in_process/_in_process.py", line 271, in build_wheel
    return _build_backend().build_wheel(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/setuptools/build_meta.py", line 410, in build_wheel
    return self._build_with_temp_dir(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/setuptools/build_meta.py", line 395, in _build_with_temp_dir
    self.run_setup()
  File "/usr/lib/python3.12/site-packages/setuptools/build_meta.py", line 311, in run_setup
    exec(code, locals())
  File "<string>", line 12, in <module>
  File "/usr/lib/python3.12/site-packages/setuptools/__init__.py", line 103, in setup
    return distutils.core.setup(**attrs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/setuptools/_distutils/core.py", line 158, in setup
    dist.parse_config_files()
  File "/usr/lib/python3.12/site-packages/setuptools/dist.py", line 632, in parse_config_files
    pyprojecttoml.apply_configuration(self, filename, ignore_option_errors)
  File "/usr/lib/python3.12/site-packages/setuptools/config/pyprojecttoml.py", line 68, in apply_configuration
    config = read_configuration(filepath, True, ignore_option_errors, dist)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/setuptools/config/pyprojecttoml.py", line 133, in read_configuration
    validate(subset, filepath)
  File "/usr/lib/python3.12/site-packages/setuptools/config/pyprojecttoml.py", line 57, in validate
    raise ValueError(f"{error}\n{summary}") from None
ValueError: invalid pyproject.toml config: `project.classifiers[10]`.
configuration error: `project.classifiers[10]` must be trove-classifier

ERROR Backend subprocess exited when trying to invoke build_wheel
@eli-schwartz eli-schwartz added bug Needs Triage Issues that need to be evaluated for severity and status. labels Jul 5, 2024
@eli-schwartz
Copy link
Contributor Author

See also: urschrei/pyzotero#178

@mgorny
Copy link
Contributor

mgorny commented Jul 5, 2024

Making it into a warning rather than error would also work for us.

@jaraco
Copy link
Member

jaraco commented Jul 17, 2024

After reviewing the code, I can add some detail.

The escape hatch was created to avoid network access for the fallback behavior when the trove_classifiers package isn't present. It assumes that if that package is present, it contains valid data and should be used to do the validation. The fact that it's stale suggests that maybe that package is not yet valid in that configuration.

I'm a little reluctant to add yet another escape hatch for this one configuration item especially for this rare case that's likely to resolve itself over time.

I'd be more inclined to just remove support for the trove_classifiers package and have a single unified "from PyPI" validator with its escape hatches and warnings.

The suggestion to make it a warning instead of an error seems reasonable, but my guess is that most users will miss the warning and only encounter it when attempting to upload to an index anyway, suggesting maybe the validation should be removed altogether.

But, the validation logic isn't maintained here, it's maintained instead in abravalheri/validate-pyproject.

@abravalheri What do you think of the proposal and the alternatives?

@jaraco jaraco added enhancement downstream and removed bug Needs Triage Issues that need to be evaluated for severity and status. labels Jul 17, 2024
@eli-schwartz
Copy link
Contributor Author

The fact that it's stale suggests that maybe that package is not yet valid in that configuration.

The fact that it's stale in this case indicates we never intended trove_classifiers to be valid at all, but we build without virtualenv isolation.

I'm a little reluctant to add yet another escape hatch for this one configuration item especially for this rare case that's likely to resolve itself over time.

It's not going to resolve itself over time, because due to dependency graph ordering there's no real reason:

  • when both trove_classifiers and pyzotero are installed globally at the distro level
  • when both have pending updates

for trove_classifiers to end up queued for installation first. Instead, updating fails, and continues to fail because every time a system update is attempted, it fails again.

If setuptools removed its validation entirely, or made it only use the network, that would equally solve the gentoo use case though. :) We already disable networking anyway.

Of course, one might ask, why not just build in an isolated environment without trove_classifiers? That's recommended by PyPA via virtualenvs. But distros generally don't do this, because they already have both network isolation (cannot install build-requires with pip) and clean build isolation via spinning up temporary containers with just distro-packaged dependencies installed. And distros want to use their own packaged dependencies for various reasons.

... but Gentoo (can, and usually does) build from source on every user's machine, and spinning up containers for every package is a hassle, slow, implies nontrivial setup, and generally is impractical. Distros like Debian and Fedora have complex distributed systems to do this and reset to a clean slate in between each build, so it doesn't matter if it takes 15 minutes to build each pure-python wheel because they have tends of buildbots building thousands of packages, and each one only has to get built once.

And most other cases of additional packages installed in a build environment aren't a problem. ;) Build isolation primarily solves the problem of people forgetting to specify everything the build depends on because they already have it installed.

So gentoo usually does what those other distros also do when individual users want to build a .deb or .rpm, and builds on the live system, and relies on --disable-foo style options where applicable (e.g. C/C++ environments, where there is a culture of build options, unlike Python where build options largely aren't a thing unless you're building data science packages that use both C++ and Fortran and cannot be built by pip because pip doesn't know how to install OpenBLAS).

And so we end up here, asking for, well, a --disable-foo style option to disable trove classifier checking. :D

@mgorny
Copy link
Contributor

mgorny commented Jul 17, 2024

But, the validation logic isn't maintained here, it's maintained instead in abravalheri/validate-pyproject.

Yeah, I've noticed that earlier and I was just about to open a pull request there. Good that I've checked mail first, let's see what resolution is preferred.

I'd be more inclined to just remove support for the trove_classifiers package and have a single unified "from PyPI" validator with its escape hatches and warnings.

Do you mean one that fetches data straight from PyPI? As long as we can force-disable Internet access, that would work for us.

@abravalheri
Copy link
Contributor

abravalheri commented Jul 18, 2024

I'd be more inclined to just remove support for the trove_classifiers package and have a single unified "from PyPI" validator with its escape hatches and warnings.

I would love to take this approach, but the PyPI Web page for listing classifiers is not completely official. I know that flit uses it... but the standards mention the package as the only true source of truth... So I am not ready to do that move unless there is commitment in supporting the HTTP api for classifiers in PyPI.

My view is that this is a tricky problem, but also a minor one... Once the latest trove-classifiers package is adopted downstream, the problem would naturally go away, right? Also, the trove-classifiers package should be mostly backwards compatible, so the risk of adopting a new version is very low...

A question: if the problem is the installation order is it possible to make trove-classifiers a build-only dependency of pyzotero? That would ensure that the latest classifiers are installed right?

@eli-schwartz
Copy link
Contributor Author

My view is that this is a tricky problem, but also a minor one... Once the latest trove-classifiers package is adopted downstream, the problem would naturally go away, right? Also, the trove-classifiers package should be mostly backwards compatible, so the risk of adopting a new version is very low...

I'm pretty sure I was quite clear that the problem does not naturally go away ever, because the solver gets into a stuck state and doesn't get unstuck until someone manually kicks it -- because it will keep trying to install the failing package before trove_classifiers. That usually means patching the setuptools distro package to have a hard dependency on trove_classifiers >= $LATEST_RELEASE which is kind of annoying.

It would be much better if we could unstick things automatically by telling setuptools to stop performing validations we don't want. We aren't uploading to PyPI.

It's confusing anyway because there's absolutely no reason people shouldn't be allowed to use whatever trove classifiers they like, even unofficial ones, on private self-run indexes.

@eli-schwartz
Copy link
Contributor Author

A question: if the problem is the installation order is it possible to make trove-classifiers a build-only dependency of pyzotero? That would ensure that the latest classifiers are installed right?

Just want to reiterate here that adding trove-classifiers as build-requires for arbitrary packages throughout the python ecosystem is not a very comfortable solution in the slightest.

Especially because trove-classifiers may not need to be installed at all -- and in fact the only reason I have it on my system is because hatchling required it.

It is terrible UX to have pyzotero require an additional build-requires, solely because if it doesn't have that build-requires then installing hatchling will result in setuptools erroring out. It's not even the kind of bug that anyone will notice for a while.

@abravalheri
Copy link
Contributor

Hi @eli-schwartz, thank you very much for the reply. I understand your concerns and appreciate your feedback.

Please note that I am not suggesting to add trove-classifiers as a build-requirements to all packages throughout the ecosystem, but rather this package in particular. I don't imagine that the majority of the packages on the ecosystem will require the very latest trove-classifiers.

Also note that when I suggested that the "problem eventually goes away" I was referring to "greenfield" deployments in the future when the trove-classifiers in the repositories that gets pulled by default is already new enough (at least this is my impression). I do agree that for a particular machine that already have the packages installed, it does not go away and require manual intervention.

I understand that you guys have your reasons to disable build isolation, but I would like to point out that build isolation was introduced to fix this kind of problem1... It is perfectly fine for OS-maintainers to attempt to optimise builds, but I think that in this case when the lack of build-isolation is directly involved in a bug, it would be important to explore other workarounds (like the inclusion in this particular package of trove-classifiers as a build dependency to force a particular installation order; or removing the problematic classifier in the package metadata).

I aim to keep the codebase as simple as possible, which is why I’m hesitant to add yet more knobs and levers. However, one potential solution could be introducing an environment variable like SETUPTOOLS_DANGEROUSLY_SKIP_PYPROJECT_VALIDATION to skip the validations all together in https://github.com/pypa/setuptools/blob/main/setuptools/config/pyprojecttoml.py. In general, this would be risky, as it bypasses important checks that reduce the set of accepted inputs; but for classifiers, it should be fine. The responsibility would fall on the developer/user that sets this environment variable to ensure the overall configuration is valid.

Regarding the idea of adding --disable-foo style options to setuptools/validate-pyproject, the current codebase (in both setuptools and validate-pyproject) doesn’t support this. However, if anyone has concrete proposals and is willing to contribute PRs, I’m open to discussing and reviewing these suggestions.

Footnotes

  1. Disabling build-isolation is historically know to cause problems with some features of the PyPA packaging ecosystem like optional dependencies and entrypoint-based plugins.

@eli-schwartz
Copy link
Contributor Author

I understand that you guys have your reasons to disable build isolation, but I would like to point out that build isolation was introduced to fix this kind of problem1...

  1. Disabling build-isolation is historically know to cause problems with some features of the PyPA packaging ecosystem like optional dependencies and entrypoint-based plugins.

This is indeed a long-standing flaw in the python ecosystem, yes. :)

Automagically doing the wrong thing based on unpredictable environment inputs is bad, and everyone agrees on that. Due to several decades of legacy baggage, it is difficult to solve this for setuptools because the automagic environment inputs serve multiple roles including extensibility by way of entrypoint-based plugins.

Build isolation is not, and never has been, a solution to this engineering problem. It's a way to avoid the side effects in a scenario where the design constraints prevent a solution from being achieved. It is also, largely, a problem that only affects setuptools and doesn't affect other build backends which take a more structural approach to defining builds.

You may wish to take note here of the wording for PEP 517. It describes build isolation and marks it as an RFC 2119 compliant "SHOULD" feature, with two PEP rationales:

  1. that it permits building multiple incompatible packages whose build requirements are mutually exclusive (and calls out pinned versions as the probable cause)
  2. that it provides the hygienic ability to catch scenarios where people have forgotten to list all their build requirements

It doesn't mention the possibility of additional environment packages modifying your own build requirements, indicating that this isn't a motivational concern behind the recommendation to use build isolation.

@eli-schwartz
Copy link
Contributor Author

Regarding the idea of adding --disable-foo style options to setuptools/validate-pyproject, the current codebase (in both setuptools and validate-pyproject) doesn’t support this. However, if anyone has concrete proposals and is willing to contribute PRs, I’m open to discussing and reviewing these suggestions.

Just for clarity, my mention of --disable-foo style options was intended as an explicit way to be evocative of GNU autoconf, not because I specifically believe setuptools should be configured via CLI flags instead of environment variables.

I suspect environment variables is ultimately easier for setuptools, if only because it doesn't require passing state around through multiple layers of "execute the setup.py script together with a phase to run".

The thing that I care about is the Autoconf model where functionality is expected to have some form of on/off switch rather than unconditionally probe whether an import is available.

@abravalheri abravalheri linked a pull request Aug 28, 2024 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants