Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing filterRanges Function and Discussion on a possible filterSimilarities #312

Merged
merged 17 commits into from
Mar 11, 2024

Conversation

philouail
Copy link
Collaborator

@philouail philouail commented Feb 15, 2024

Hi, I wanted to suggest the implementation of this function filterRanges, which allows the user to filter a spectra object based on any variables of its spectraData suing a range of values that the user inputs . I believe this would be great for multiple reasons:

  • Allow multiple filtering to happen at the same time.
  • Allow for seamless integration of filtering based on any future-added or user-specific spectraData variable.
  • Prevent the creation of many variable-specific filtering functions in the future.

Welcoming any feedback on the code and PR :)

On the other hand, I would also like to suggest implementing a similar function such as filterSimilarities (not sure about the name), which would be implemented in a similar way for filtering based on similarity to specific values for variables within the spectraData of the object rather than within a range. Thus, it would have the same advantages as the filterRanges function. It would prevent variable-specific functions and make the code easier to combine multiple filters.

I can however see the counter argument that this might not be as user-friendly as variable-specific functions. I would love to get your opinion @jorainer and @lgatto on this. If you are interested i can come up with a more detailed proposition of implementation.

Copy link
Member

@jorainer jorainer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @philouail ! Great PR! Some change requests and questions.

Also, something strange happened with the roxygen documentation.

R/Spectra.R Outdated Show resolved Hide resolved
R/Spectra.R Outdated Show resolved Hide resolved
R/Spectra.R Outdated Show resolved Hide resolved
R/Spectra.R Show resolved Hide resolved
R/Spectra.R Outdated Show resolved Hide resolved
R/Spectra.R Outdated Show resolved Hide resolved
R/Spectra.R Outdated Show resolved Hide resolved
R/Spectra.R Outdated Show resolved Hide resolved
@jorainer
Copy link
Member

To me this function is great and also the filterSimilarities - that I would rather suggest to call filterValues would make a lot of sense to me. @lgatto what's your opinion on that?

We have already a use case in which we want to extract/filter spectra with their precursor m/z and retention time within respective ranges (from MS1 data).

@philouail
Copy link
Collaborator Author

Thanks @jorainer ! I have adapted the code to your suggestions.

Copy link
Member

@jorainer jorainer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Phili! This function makes a lot of sense to me!

Copy link
Member

@lgatto lgatto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thank you Phili.

I would like to discuss the name of this and equivalent similarities functions. At first, the name was very confusing to me, as it reminded me of existing function that filter ranges, filterMzRange() and filterPrecursorMzRange() [1]. So I would suggest to use a name that refers to spectraData explicitly. And then use that name to inform the one focusing on similar values.

[1] grep("ange", grep("filter", ls("package:Spectra"), value = TRUE), value = TRUE)

NEWS.md Outdated Show resolved Hide resolved
R/Spectra.R Outdated Show resolved Hide resolved
R/Spectra.R Outdated Show resolved Hide resolved
R/Spectra.R Outdated Show resolved Hide resolved
R/Spectra.R Outdated Show resolved Hide resolved
tests/testthat/test_Spectra.R Show resolved Hide resolved
@jorainer
Copy link
Member

Thanks for the review @lgatto ! Regarding the name, if I understand you you would like to name the function to something like filterSpectraDataRanges? Or could you provide a possible example for the name?

I would not use spectraData in the name of the function - I would understand that I would filter/subset the spectraData (i.e. remove certain columns of spectraData) instead of filtering the Spectra.

To me filterRanges would make sense, because we want to filter the Spectra using values in one or more spectraVariables (hence the name of that parameter being spectraVariables). Also, for the other method we would use filterValues. And yes, the idea was in fact that the method should be similar to e.g. filterPrecursorMzRange or filterPrecursorMzValue, so the user would find/understand the method right away. In facto filterPrecursorMzRange would be the same as calling filterRanges(, spectraVariable = "precursorMz", ...).

We also thought of a generic filterSpectra method and a RangeFilter parameter - that would be another possibility, but then we would add a new S4 object (RangeFilter) and hence increase the complexity of Spectra...

@lgatto
Copy link
Member

lgatto commented Feb 19, 2024

OK for filterRanges(), you convinced me. Please make sure in the documentation that filterPrecursorMzRange() and filterRanges(, spectraVariable = "precursorMz", ...) are equivalent.

@philouail
Copy link
Collaborator Author

Thanks @lgatto for the feedback ! I will adjust the documentation and the code accordingly. As well as adding the filterValues() function and a explanation in the vignette. I will try to make it as clear as possible how they can be used :) Putting it in draft for now while i prepare it !

@philouail philouail marked this pull request as draft February 19, 2024 15:54
@philouail
Copy link
Collaborator Author

So here is everything updated. I moved the main code to the MsBackend.R to have a similar implementation as the other filtering functions.

Some things i am not 100% sure about:

  • location of unit tests : is test_Spectra.R fine ?
  • Should tolerance be by default Inf ? In mclosest() it is but filterPrecursorMzValue() is set as 0.
  • the param values name ? there is already a value one, just want to be sure it's not confusing.

Also curiosity question but what is the different between filterPrecrusorMz, filterPrecursorMzValue, and filterPrecursorMzRange.

I will write the vignette now while i wait for feedback :) Thanks again for the help !

@jorainer
Copy link
Member

Regarding your last question: filterPrecursorMz and filterPrecursorMzValues are the same (i.e. match/filter based on similarity of the precursor m/z). Both (should?) support to match against multiple target m/z values (e.g. filterPrecursorMzValues(sps, mz = c(123.3, 234.2)) to keep spectra with a precursor m/z that matches any of the provided values). The filterPrecursorMzRange allows to define a broader range (i.e. precursor m/z within a specified range).

@jorainer
Copy link
Member

For the default of tolerance. For filterValues I would use 0 as default. also for other methods (filterPrecursorMzValues) we have 0. The mclosest (and closest) are a bit different than these filter functions: from a filter function I would expect to subset the data based on a (stringent) condition. From closest mclosest I would just like to get the most similar hits. The tolerance there is essentially a way to define that something is too different to be considered close. Hope this does make sense?

Copy link
Member

@jorainer jorainer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @philouail ! Thanks! I have some change requests - mostly related to documentation - and one suggestion to maybe add an additional parameter to both functions that allows to combine the multiple filters/conditions either with a logical and (the default) or a logical or.

R/MsBackend.R Outdated Show resolved Hide resolved
R/MsBackend.R Outdated Show resolved Hide resolved
R/MsBackend.R Outdated Show resolved Hide resolved
R/MsBackend.R Outdated Show resolved Hide resolved
R/MsBackend.R Outdated Show resolved Hide resolved
R/MsBackend.R Outdated Show resolved Hide resolved
R/Spectra.R Outdated Show resolved Hide resolved
R/Spectra.R Outdated Show resolved Hide resolved
R/Spectra.R Outdated Show resolved Hide resolved
R/Spectra.R Outdated Show resolved Hide resolved
@philouail
Copy link
Collaborator Author

Thanks @jorainer for the idea, I'll update the code for the comments you made and implement the extra parameter !

@philouail philouail marked this pull request as ready for review March 5, 2024 15:54
@philouail philouail requested a review from jorainer March 5, 2024 15:54
Copy link
Member

@jorainer jorainer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great contribution Phili! Thanks!

I'm totally fine with the implementations, but have some suggestions to improve/fix the documentation.

R/AllGenerics.R Outdated Show resolved Hide resolved
R/MsBackend.R Outdated Show resolved Hide resolved
R/MsBackend.R Outdated Show resolved Hide resolved
R/MsBackend.R Outdated Show resolved Hide resolved
R/MsBackend.R Outdated Show resolved Hide resolved
vignettes/Spectra.Rmd Outdated Show resolved Hide resolved
vignettes/Spectra.Rmd Outdated Show resolved Hide resolved
vignettes/Spectra.Rmd Show resolved Hide resolved
vignettes/Spectra.Rmd Outdated Show resolved Hide resolved
vignettes/Spectra.Rmd Outdated Show resolved Hide resolved
Copy link
Member

@jorainer jorainer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Phili! All good from my side!

NEWS.md Outdated Show resolved Hide resolved
R/MsBackend.R Outdated
@@ -121,6 +126,11 @@
#' @param ppm For `filterPrecursorMzValues`: `numeric(1)` with the m/z-relative
#' maximal acceptable difference for a m/z to be considered matching. See
#' [closest()] for details.
#' For `filterValues`: `numeric` of any length allowing to define
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is also missing for the other functions, but I find it useful to use () for functions, to explicitly differentiate them form variables. So here, filterValues(). If @jorainer agrees and if you have time, this could be changed everywhere. Otherwise, leave it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I can do that in the next push !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that makes sense - can you then please do that for all functions - so that we have everything consistent.

R/MsBackend.R Outdated Show resolved Hide resolved
@lgatto
Copy link
Member

lgatto commented Mar 11, 2024

Thank you @philouail for the great contribution!

@jorainer jorainer merged commit d5f27bc into main Mar 11, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants