Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graphPeaks (recreate PR after merging) #23

Closed
wants to merge 25 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
ba57a62
add function dotproduct and unit tests
tnaake Aug 29, 2019
24e496a
style: add spaces between = in function arguments
tnaake Sep 17, 2019
6ed0d3e
refactor: remove normalize as an argument
tnaake Sep 17, 2019
0b5c377
refactor: check if mz values are identical and raise a warning if the…
tnaake Sep 17, 2019
f48fe4b
docs: rewrite details section, add references section
tnaake Sep 17, 2019
23b6e12
docs: remove tags name and usage in documenation
tnaake Sep 17, 2019
75e113d
docs: explain better arguments m and n, add explanation to details se…
tnaake Sep 17, 2019
9a8530c
test: remove argument normalize, add tests with different length to u…
tnaake Sep 17, 2019
1411ae8
feat: write dotproduct to NAMESPACE
tnaake Sep 17, 2019
cea33a7
Merge remote-tracking branch 'upstream/master'
tnaake Sep 17, 2019
565bc0f
docs: add contributor
tnaake Sep 17, 2019
5ac3015
docs: add dotproduct in NEWS file
tnaake Sep 17, 2019
04adafa
test: add expect_warning since warnings are treated as errors
tnaake Sep 17, 2019
827112a
docs: change n=2 to n=0 in order to not create a WARNING
tnaake Sep 17, 2019
45314e7
test: add tests for m and n are numeric and length identical of x, x,…
tnaake Sep 17, 2019
f3dfa62
fix: use matrix instead of data.frame for dotproduct input
tnaake Sep 29, 2019
8a8036e
feat: add new graphPeaks.R
tnaake Sep 29, 2019
eb20aba
docs: update contribution
tnaake Sep 29, 2019
707f365
add function graphPeaks
tnaake Oct 3, 2019
76b3a80
refactor: change to matrix in dotproduct
tnaake Oct 3, 2019
6e1d45b
refactor: unit tests for graphPeaks, checkEquals for result of graphP…
tnaake Oct 3, 2019
9717110
test: add tests for graphPeaks
tnaake Oct 3, 2019
0507891
test: add tests for graphPeaks
tnaake Oct 3, 2019
c214dba
test: add regexpr for expect_warning and expect_error
tnaake Oct 6, 2019
93f3787
test: add regexpr for expect_warning and expect_error
tnaake Oct 6, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,15 @@ Authors@R: c(person(given = "Laurent", family = "Gatto",
email = "mail@sebastiangibb.de",
role = c("aut", "cre"),
comment = c(ORCID = "0000-0001-7406-4443")),
person(given = "Sigurdur", family = "Smarason", role = "ctb")
person(given = "Sigurdur", family = "Smarason", role = "ctb"),
person(given = "Thomas", family = "Naake", role = "ctb")
)
Author: Laurent Gatto, Johannes Rainer and Sebastian Gibb.
Maintainer: Laurent Gatto <laurent.gatto@uclouvain.be>
Depends:
R (>= 3.6.0)
Imports:
igraph,
methods,
S4Vectors
Suggests:
Expand Down
5 changes: 5 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,15 @@ export(coefMA)
export(coefSG)
export(coefWMA)
export(common)
export(dotproduct)
export(graphPeaks)
export(join)
export(localMaxima)
export(noise)
export(ppm)
export(rbindFill)
export(refineCentroids)
export(shiftMatrix)
export(smooth)
export(valleys)
export(vapply1c)
Expand All @@ -24,6 +27,8 @@ export(vapply1l)
importClassesFrom(S4Vectors,Rle)
importFrom(S4Vectors,Rle)
importFrom(S4Vectors,nrun)
importFrom(igraph,components)
importFrom(igraph,graph_from_adjacency_matrix)
importFrom(methods,as)
importFrom(methods,is)
importFrom(stats,filter)
Expand Down
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

- Add `asInteger` and `rbindFill`.
- Add `asRle`, `asRleDataFrame` and `asVectorDataFrame`.
- Add `dotproduct`

## MsCoreUtils 0.0.1

Expand Down
102 changes: 102 additions & 0 deletions R/dotproduct.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
#' @title Calculate the normalized dot product
#'
#' @description
#' Calculate the normalized dot product (NDP).`dotproduct` returns a numeric
#' value ranging between 0 and 1, where 0 indicates no similarity between the
#' two MS/MS features, while 1 indicates that the MS/MS features are identical.
#'
#' @param
#' x `matrix` with two column where one contains m/z values (column `"mz"`) and
#' the second corresponding intensity values (column `"intensity"`)
#'
#' @param
#' y `matrix` with two column where one contains m/z values (column `"mz"`) and
#' the second corresponding intensity values (column `"intensity"`)
#'
#' @param m `numeric(1)`, exponent for peak intensity-based weights
#'
#' @param n `numeric(1)`, exponent for m/z-based weights
#'
#' @details
#' Each row in `x` corresponds to the respective row in `y`, i.e. the peaks
#' (entries `"mz"`) per spectrum have to match.
#'
#' `m` and `n` are weights given on the peak intensity and the m/z values
#' respectively. As default (`m = 0.5`), the square root of the intensity
#' values are taken to calculate weights. With increasing values for `m`, high
#' intensity values become more important for the similarity calculation,
#' i.e. the differences between intensities will be aggravated.
#' With increasing values for `n`, high m/z values will be taken more into
#' account for similarity calculation. Especially when working with small
#' molecules, a value `n > 0` can be set, to give a weight on the m/z values to
#' accommodate that shared fragments with higher m/z are less likely and will
#' mean that molecules might be more similar. If `n != 0`, a warning will be
#' raised if the corresponding m/z values are not identical, since small
#' differences in m/z values will distort the similarity values with increasing
#' `n`. If `m=0` or `n=0`, intensity values or m/z values, respectively, are not
#' taken into account.
#'
#' The normalized dot product is calculated according to:
#' \deqn{NDP = \frac{\sum(W_{S1, i} \cdot W_{S2, i}) ^ 2}{ \sum(W_{S1, i} ^ 2) * \sum(W_{S2, i} ^ 2) }}{\sum(W_{S1, i} \cdot W_{S2, i}) ^ 2 \sum(W_{S1, i} ^ 2) * \sum(W_{S2, i} ^ 2)},
#' with \eqn{W = [ peak intensity] ^{m} \cdot [m/z]^n}.
#' For further information on normalized dot product see for example
#' Li et al. (2015).
#' Prior to calculating \deqn{W_{S1}} or \deqn{W_{S2}}, all intensity values
#' are divided by the maximum intensity value and multiplied by 100.
#'
#' @references
#' Li et al. (2015): Navigating natural variation in herbivory-induced
#' secondary metabolism in coyote tobacco populations using MS/MS structural
#' analysis. PNAS, E4147--E4155, DOI: 10.1073/pnas.1503106112.
#'
#' @return
#' `numeric(1)`, `dotproduct` returns a numeric similarity coefficient between
#' 0 and 1.
#'
#' @author Thomas Naake, \email{thomasnaake@@googlemail.com}
#'
#' @examples
#' x <- matrix(c(c(100.002, 100.001, NA, 300.01, 300.02, NA),
#' c(2, 1.5, 0, 1.2, 0.9, 0)), ncol = 2,)
#' y <- matrix(c(c(100.0, NA, 200.0, 300.002, 300.025, 300.0255),
#' c(2, 0, 3, 1, 4, 0.4)), ncol = 2)
#' colnames(x) <- colnames(y) <- c("mz", "intensity")
#' dotproduct(x, y, m = 0.5, n = 0)
#'
#' @export
dotproduct <- function(x, y, m = 0.5, n = 0) {

## check valid input
if (!is.matrix(x)) stop("'x' is not a matrix")
if (!is.matrix(y)) stop("'y' is not a matrix")

if (nrow(x) != nrow(y)) stop("nrow(x) and nrow(y) are not identical")
if (!is.numeric(m) || length(m) != 1)
stop("`m` has to be a numeric of length 1.")
if (!is.numeric(n) || length(n) != 1)
stop("`n` has to be a numeric of length 1.")

## retrieve m/z and intensity from x and y
mz1 <- x[, "mz"]
mz2 <- y[, "mz"]
inten1 <- x[, "intensity"]
inten2 <- y[, "intensity"]

## check mz values: if mz1 and mz2 are not identical and the values are
## weighted by n, this might to unexpected results in the similarity
## calculation
if (n && any(mz1 != mz2, na.rm = TRUE))
warning("m/z values in `x` and `y` are not identical. ",
"For n != 0 this might yield unexpected results.")

inten1 <- inten1 / max(inten1, na.rm = TRUE) * 100
inten2 <- inten2 / max(inten2, na.rm = TRUE) * 100

ws1 <- inten1 ^ m * mz1 ^ n
ws2 <- inten2 ^ m * mz2 ^ n

## calculate normalized dot product
dp <- sum(ws1 * ws2, na.rm = TRUE)
dp ^ 2 / (sum(ws1 ^ 2, na.rm = TRUE) * sum(ws2 ^ 2, na.rm = TRUE))
}

Loading