Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cech persistence sklearn #1126

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

VincentRouvreau
Copy link
Contributor

@VincentRouvreau VincentRouvreau commented Aug 28, 2024

No description provided.


class CechPersistence(BaseEstimator, TransformerMixin):
"""
This is a class for constructing Čech complexes and computing the persistence diagrams from them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sentence is very misleading: no Cech complex is built in the process, that's the point of the alpha and delaunay-cech, they build smaller filtrations that still have the same diagram.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked class explanation on 322487c

@mglisse
Copy link
Member

mglisse commented Sep 12, 2024

CI will fail as it is uses and requires #1117

It actually passes. I guess because it isn't in CMakeLists.txt yet?

@VincentRouvreau VincentRouvreau marked this pull request as ready for review September 24, 2024 07:35
@VincentRouvreau VincentRouvreau added the enhancement New feature or request label Sep 24, 2024
@DavidLapous
Copy link
Contributor

Could we also add typing to this pipeline (cf #920) ?
Also, I'm not a big fan of the "max_alpha_square"; I agree that we should be coherent with alpha complexes,
but I'm already not a fan of the alpha square there (cf #771 @CharlesArnal).
Furthermore, Rips complexes are also expected to be coherent with Cech complexes.
I'm more in favor to "threshold", but this can be discussed.

@mglisse
Copy link
Member

mglisse commented Sep 26, 2024

Could we also add typing to this pipeline (cf #920) ?

As I said in the other PR, the interaction with the doc is suboptimal currently, Vincent is looking into options to make it nicer so we can start adding typing all over the place.

Also, I'm not a big fan of the "max_alpha_square";

An option would be to remove it, it is not very useful for alpha-complex / delaunay-cech (what this class currently uses), but it seems likely that we will add a pure Cech option at some point (useful in high dimension with a very low threshold) where the threshold will be essential.

I agree that we should be coherent with alpha complexes, but I'm already not a fan of the alpha square there (cf #771 @CharlesArnal).

Ah, right, this was supposed to be the occasion to tackle this sqrt business, so let's do that.

Furthermore, Rips complexes are also expected to be coherent with Cech complexes.

Note that even with the square root, we will likely stick to a convention where the edges have a filtration twice as large in the Rips than in the Cech (diameter vs radius).


So, what would be a good way to handle the square/sqrt issue?
The squared output is necessary for some weighted cases, so it cannot disappear. But users often want non-squared filtration values.
We could have an option (output_squared_filtrations, do_sqrt, other suggestions?), and depending on this option, we would apply np.sqrt to the diagram before outputting it? We could also add the option to AlphaComplex (computing the sqrt in the SimplexTree is probably more costly than on a numpy array, so from CechPersistence we would likely build the AlphaComplex with squared values always and compute the square root on the diagram).
Another question is what we want by default. It looks like there are a lot of vote in favor of sqrt-by-default. If so, we will need to be super explicit (and redundant) in the doc about what the default is, and the fact that it is not the same as in AlphaComplex.
The name threshold is good, but how does it interact with the option above? If we use square roots, then it seems natural for the threshold to be a length. But in the case with squared filtrations, the threshold would be a squared thing as well I guess (necessary if weights turn everything negative)? So maybe the option should not be output_squared_filtrations because the square is not only used in the output, maybe use_squared_filtrations or just squared_filtrations?

@DavidLapous
Copy link
Contributor

Note that even with the square root, we will likely stick to a convention where the edges have a filtration twice as large in the Rips than in the Cech (diameter vs radius).

IIRC, in the literature, this is often presented as

$$\forall t \ge 0, \quad \mathrm{\v Cech}(X)_t \simeq \mathrm{Alpha}(X)_t\quad \textnormal{ and }\quad \mathrm{\v Cech}(X)_t \subseteq \mathrm{VR}(X)_{2t}\subseteq \mathrm{\v Cech}(X)_{2t},$$

so this factor 2 is to be expected (but could be indeed be discussed).

So, what would be a good way to handle the square/sqrt issue?

I think a good solution to this squared issue would be to have two separate classes in the interface,
e.g., CechComplex and WeightedAlphaComplex. This would also make the threshold clear.

@mglisse
Copy link
Member

mglisse commented Sep 26, 2024

I think a good solution to this squared issue would be to have two separate classes in the interface, e.g., CechComplex and WeightedAlphaComplex. This would also make the threshold clear.

Do I understand correctly that you would split squared / not squared as follows, without options?

  • classes that build an unweighted complex: CechComplex, AlphaComplex, DelaunayCechComplex -> sqrt
  • separate weighted classes: WeightedCechComplex, WeightedAlphaComplex, WeightedDelaunayCechComplex -> squared
  • classes that give the persistence diagram directly: CechPersistence -> sqrt, WeightedCechPersistence -> squared

This means that

  • CechComplex gives a different output than WeightedCechComplex with all 0 weights, it is still confusing
  • we cannot get AlphaComplex with squared values (the current behavior), or WeightedThing with sqrt values
  • it forces to have twice as many classes (that's much more code, but it may make the documentation a bit simpler, I don't know)

It is an acceptable possibility, but I wouldn't go so far as to call it "good".

@VincentRouvreau
Copy link
Contributor Author

As I said in the other PR, the interaction with the doc is suboptimal currently, Vincent is looking into options to make it nicer so we can start adding typing all over the place.

cf. #1137

@DavidLapous
Copy link
Contributor

DavidLapous commented Oct 2, 2024

  • CechComplex gives a different output than WeightedCechComplex with all 0 weights, it is still confusing

A possibility is to put directly the square in the name of the concerned classes. I've seen stuff like sqnorm in machine learning libraries for instance, which seems reasonable. I don't know if there is a perfect solution to this issue indeed.

  • we cannot get AlphaComplex with squared values (the current behavior), or WeightedThing with sqrt values

Yes, but I feel that users that want to compute weighted alpha complexes, or squared alpha complexes specifically are not neophyte to TDA (I've never seen someone use weights, or want the square), so we can expect them to read the documentation for this specific use case. --- I may be mistaken on these stats.

or WeightedThing with sqrt values

Q. Is it possible to define WeightedAlphaComplexes with a squared rescaling, while preserving the weights on the points?
If there exists a natural definition, we can just remove the square in all cases, and put a flag to enable this.

  • it forces to have twice as many classes (that's much more code, but it may make the documentation a bit simpler, I don't know)

If the number of classes is an issue, we can just alias classes to other with proper default values. Doing multiple classes also reduces the cyclomatic complexity, so depending on what we are doing, it's not clear what is the best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants