Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add guide on how to handle missing extras at runtime #1606

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

smheidrich
Copy link

@smheidrich smheidrich commented Oct 5, 2024

This isn't even close to done yet, I'm just opening a PR already so that if anyone has objections to the basic direction this is taking, they can already raise them.

Fixes #1605

@smheidrich smheidrich force-pushed the fix-gh-1605-handling-missing-extras branch from 5f0ec81 to 0f034b6 Compare October 5, 2024 12:29
@sinoroc
Copy link
Contributor

sinoroc commented Oct 5, 2024

You can use importlib.metadata to check what is installed in the environment. This is in the standard library, no need to install setuptools for this.

And typically in Python I guess you would rather try to import and then handle if it fails. Something like this off the top of my head:

try:
    import some_library
except ImportError, ModuleNotFoundException:
    handle_missing_dependency()

@smheidrich
Copy link
Author

smheidrich commented Oct 5, 2024

You can use importlib.metadata to check what is installed in the environment. This is in the standard library, no need to install setuptools for this.

Yes, and that's what the solution from pypa/packaging-problems#664 tries to do, but it's quite involved, so I'll probably either link to it or the 3rd party package it ended up in.

And typically in Python I guess you would rather try to import and then handle if it fails. Something like this off the top of my head:

try:
    import some_library
except ImportError, ModuleNotFoundException:
    handle_missing_dependency()

That's definitely on option I'll add (cf. TODO in current version saying "Handling failing imports"), but I don't think it's the be-all and end-all because it can fail in difficult-to-debug ways when there is a version mismatch: Let's say my extra depends on somepkg 1.x and someone else's package otherpkg depends on somepkg 1.x-2.x, with 2.x not being generally backwards-compatible. Now someone installs both my package without the extra and otherpkg. Unless I'm mistaken, installers will just ignore the (unused) extra's version requirement for somepkg 1.x and install the latest version supported by all (actually used) dependencies, which is somepkg 2.x. Now the import itself succeeds, but due to the backwards-incompatibility, chaos ensues.

source/guides/handling-missing-extras-at-runtime.rst Outdated Show resolved Hide resolved
Using ``pkg_resources`` (deprecated)
====================================

The now-deprecated :ref:`pkg_resources <ResourceManager API>` package (part of
Copy link
Contributor

@sinoroc sinoroc Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is now, I would vote against adding a new guide page that recommends using APIs that are deprecated. I would scrap the whole section. We already have too many of those (guides that are already deprecated but can't seem to get rid of).

Copy link
Author

@smheidrich smheidrich Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But at least to me, it seems like the pkg_resources approach remains one of the better options, e.g. for people who don't want to copy&paste the fairly large snippet from pypa/packaging-problems#664 nor depend on hbutils which includes it (but has a lot of other stuff they won't need).

If importlib.metadata or packaging had their own version of this functionality (pypa/packaging-problems#317), I'd agree that there should be no mention of pkg_resources anymore.

I'll definitely set up an automatic reminder to remove this section triggered by pypa/packaging-problems#317 being closed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have the last word on this, I do not have authority on this repository, and it is your pull request. So feel free to push forward with this if you want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setuptools repository has been spending a lot of energy in communicating to users to not use pkg_resources, so I also would prefer not to incentivise it. Probably it is best to not even mention.

Moreover, since the functionality is deprecated, the support is not guaranteed, and once it breaks (pkg_resource implementation can be very fragile), there is no guarantee it is going to be fixed or the function simply removed.

Since setuptools has declared many years ago no interest in maintaining this functionality and it is actively discouraging users from importing it, one alternative that users may be interested in is to create and maintain their own helper in a 3rd-party library, by copying the relevant bits from pkg_resources and/or the mentioned PR. By maintaining a separated 3rd-party library, that avoids constant copy and paste.

Copy link
Author

@smheidrich smheidrich Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one alternative that users may be interested in is to create and maintain their own helper in a 3rd-party library

I've asked the author of hbutils if they'd be fine with splitting off that one function into a separate, smaller package: pypa/packaging-problems#664 (comment)
That would probably make it a more sensible thing for the guide to recommend.

In the meantime, I've added a giant warning above the pkg_resources section, but I guess it could be removed completely if there was a focused replacement package.

@sinoroc
Copy link
Contributor

sinoroc commented Oct 6, 2024

it can fail in difficult-to-debug ways when there is a version mismatch: Let's say my extra depends on somepkg 1.x and someone else's package otherpkg depends on somepkg 1.x-2.x, with 2.x not being generally backwards-compatible. Now someone installs both my package without the extra and otherpkg. Unless I'm mistaken, installers will just ignore the (unused) extra's version requirement for somepkg 1.x and install the latest version supported by all (actually used) dependencies, which is somepkg 2.x. Now the import itself succeeds, but due to the backwards-incompatibility, chaos ensues.

True. It is advanced level, though, and probably a much rarer case. I guess the guide page should very clearly reflect that. Level 1: only check if the things import correctly. Level 2: check if the distributions are installed (importlib.metadata). Level 3: check if the installed distributions have the correct versions (importlib.metadata + packaging or equivalent).

But, please no pkg_resources, it is deprecated, let's not add this kind of cruft.

Detecting missing extras
========================

We first consider how to *detect* if an extra is missing, leaving what to do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, it's not detecting extras but a feature-detection. If somebody installs a library separately, instead of using extras, the effect is the same.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the 2nd sentence but don't see how the first follows from it: If the dependencies for some-pkg[some-extra] are installed separately, not only will the feature that requires the extra become available, but also functions that check if the extra is installed (like pkg_resources.requires(["some-pkg[some-extra]"])) will report that it is, so IMHO it should be fine to say "we're checking if the extra is installed".

Unless you want to make the general distinction between an "extra" as in the part of the package metadata and "extra" as in the actual set of dependencies it describes? 🤔

But even then, "feature detection" sounds very generic, like something that doesn't even necessarily involve dependencies... Unless this is an established term already used elsewhere in the packaging ecosystem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the dependencies for some-pkg[some-extra] are installed separately, not only will the feature that requires the extra become available, but also functions that check if the extra is installed (like pkg_resources.requires(["some-pkg[some-extra]"])) will report that it is, so IMHO it should be fine to say "we're checking if the extra is installed".

Does that always work?

What happens if some-pkg[some-extra] depends on other-pkg < 1, then the user installs some-pkg (without the extra) and then other-pkg > 1? Would pip allow them to do that? If so, that potentially breaks runtime API access...


There is also some criticism on the community about "spooky action at distance". Sometimes users don't want the effect of having an "extra" if the package was not explicitly installed with that extra, even if somehow the dependencies match-up in the virtual env.

This is an open discussion with no solution for the time being in the ecosystem.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if some-pkg[some-extra] depends on other-pkg < 1, then the user installs some-pkg (without the extra) and then other-pkg > 1? Would pip allow them to do that?

Pip would allow the installation but pkg_resources.requires would then correctly report that the extra is not installed due to the version mismatch. That's why I think a check utilizing something like pkg_resources.requires is preferable over a simple "Does the import work?" check.

Sometimes users don't want the effect of having an "extra" if the package was not explicitly installed with that extra, even if somehow the dependencies match-up in the virtual env.

Yes but that's how it currently is so IMO the language of the guide should reflect that reality.

Copy link
Contributor

@sinoroc sinoroc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this whole page looks like a "discussion" (somewhat opinionated even) rather than a "guide".

My advice, keep it much shorter, stick to the sure things, don't get too adventurous. This is a very good idea to have a guide for this, that I am convinced. But I am not convinced by the current content.

If it were me I think I would go like this:

  1. Make a first version of the guide, a simple one, that presents the problem and shows the most simple solution: (try, import, except). Get this merged.
  2. Add the next solution to the guide: check if installed with importlib.metadata. Get this merged.
  3. Add a discussion page with your more involved ideas for checking if the installed version satisfies the dependency constraints.

Comment on lines 56 to 62
This process is currently quite involved. An implementation can be found in
`packaging-problems #664 <packaging-problems-664_>`_, which is also made
available in the `hbutils <https://pypi.org/project/hbutils/>`_ package as
``hbutils.system.check_reqs``.
The possibility of offering a similar helper function in ``importlib.metadata``
or ``packaging`` themselves is still being discussed
(`packaging-problems #317 <packaging-problems-317_>`_).
Copy link
Contributor

@sinoroc sinoroc Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be better to copy-paste the code itself here (if it is not too long).

Copy link
Author

@smheidrich smheidrich Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's basically these lines plus imports: https://github.com/HansBug/hbutils/blob/927b0757449a781ce8e30132f26b06089a24cd71/hbutils/system/python/package.py#L171-L223

Even if the docstring is removed, IMO that's too long (& complex) to include.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to reduce it down to this, which is a grand total of 4 lines shorter than the original (wow...) but IMHO a bit easier to understand:

Slightly modified hbutils.system.check_reqs
from collections.abc import Iterable

from packaging.metadata import Metadata
from packaging.requirements import Requirement

try:
  import importlib.metadata as importlib_metadata
except (ModuleNotFoundError, ImportError):
  import importlib_metadata


def check_reqs(req_strs: Iterable[str]) -> bool:
  return all(
    _check_req_recur(req)
    for req_str in req_strs
    if not (req := Requirement(req_str)).marker or req.marker.evaluate()
  )


def _check_req_recur(req: Requirement) -> bool:
  try:
    version = importlib_metadata.distribution(req.name).version
  except importlib_metadata.PackageNotFoundError:
    return False  # req not installed

  if not req.specifier.contains(version):
    return False  # req version does not match

  metadata = Metadata.from_raw(
    importlib_metadata.metadata(req.name).json, validate=False
  )
  for child_req in metadata.requires_dist or []:
    for extra in req.extras:
      if child_req.marker and child_req.marker.evaluate({"extra": extra}):
        if not _check_req_recur(child_req):
          return False
        break

  return True

So I'd be fine with including it in the guide in this form.

But in the course of doing this I noticed that even the original version doesn't have 100% feature parity with pkg_resources.requires (new issue about this: HansBug/hbutils#109), so it might need to be changed again...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a slightly more compact (but still long) version of this snippet to the guide now.

@smheidrich
Copy link
Author

I feel like this whole page looks like a "discussion" (somewhat opinionated even) rather than a "guide".

My advice, keep it much shorter, stick to the sure things, don't get too adventurous. This is a very good idea to have a guide for this, that I am convinced. But I am not convinced by the current content.

It's "inspired" mainly by the existing Single-sourcing the package version guide, which is a list of half a dozen options that the reader can choose from.

But I agree that the entire section about what to do about missing extras doesn't feel that valuable - it's basically a list of pros and cons of different approaches that readers could have probably figured out themselves in the time it takes to read.

In contrast, detection of missing extras is where a guide is actually useful because the safe solution (i.e. with versions checked) is not that obvious.

If it were me I think I would go like this:

  1. Make a first version of the guide, a simple one, that presents the problem and shows the most simple solution: (try, import, except). Get this merged.

If it just recommends try-import-except, then I don't see a point in having a guide at all because that's what people will try to do naturally without a guide. At least to me, the whole point is to recommend an approach that doesn't have the "version mismatch" safety issue try-import-except has. (If you look at what people who work with other languages don't like about Python, almost all have some horror story about weird undebuggable dependency issues, so if there is an option to reduce the chance of those, it should be taken IMO)

  1. Add the next solution to the guide: check if installed with importlib.metadata. Get this merged.

  2. Add a discussion page with your more involved ideas for checking if the installed version satisfies the dependency constraints.

I don't think there is any point in an importlib.metadata approach if it doesn't check the version - that's just a worse (because less Pythonic, LBYL rather than EAFP) variant of the try-import-except approach.

So all in all, maybe the whole page should just be renamed to "Detecting missing extras as runtime" and be only about that for now?

@sinoroc
Copy link
Contributor

sinoroc commented Oct 8, 2024

It's "inspired" mainly by the existing Single-sourcing the package version guide, which is a list of half a dozen options that the reader can choose from.

This page will disappear very soon:

This page is a very good example of what we do NOT want to have on the guide.

If it just recommends try-import-except, then I don't see a point in having a guide at all because that's what people will try to do naturally without a guide.

The target audience of the guide might be much wider than you think. I am not convinced all users (from the most experienced to the newest beginners) will think of this approach immediately.

I don't think there is any point in an importlib.metadata approach if it doesn't check the version - that's just a worse (because less Pythonic, LBYL rather than EAFP) variant of the try-import-except approach.

It's not that simple. First who says LBYL is more Pythonic that EAFP? OK, that is off-topic, so maybe don't answer. But second and most importantly, it is not because I can import multipart that I actually have done pip install multipart and not pip install python-multipart (read this). Checking that the correct distribution package is installed does matter, because there is no one-to-one relationship between import package name and distribution package name.

@smheidrich
Copy link
Author

First who says LBYL is more Pythonic that EAFP?

I meant the other way round 🙃

Checking that the correct distribution package is installed does matter, because there is no one-to-one relationship between import package name and distribution package name.

Good point 👍 But another argument against leaving out the version check is that it turns out it only makes a difference of 2 lines, as can be seen in #1606 (comment):

if not req.specifier.contains(version):
  return False  # req version does not match

So I'd still prefer going with that variant from the get-go.

Comment on lines +141 to +155
Handling missing extras
=======================

Where and how to embed the detection of missing extras in a package and what
actions to take upon learning the outcome depends on the specifics of both the
package and feature requiring the extra.
Some common options are:

- Raise a custom exception that includes the name of the missing extra.
- In applications, show an error message when an attempt is made to use the
feature that requires the extra.
- In libraries, provide a function that lets library consumers query which
features are available.

... and probably more.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replaced the verbose "discussion" from the previous version with this short section, hope that is better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should include guidance on how to handle missing optional dependencies / extras at runtime
4 participants