- Installation
- Use examples
- Citation
- Benchmark dataset generation
- Reliability assessment of non-targeted data pre-processing
- Matching between BM and NPP output (background)
- Generation and interpretation of NPP performance metrics
- References
The goal of mzRAPP is to allow reliability assessment of non-targeted data pre-processing (NPP; XCMS, XCMS3, MetaboanalystR 3.0, SLAW, XCMS-online, MZmine 2, MZmine 3, MS-DIAL, OpenMS, El-MAVEN,..) in the realm of liquid chromatography high-resolution mass spectrometry (LC-HRMS). mzRAPPs approach is based on the increasing popularity of merging non-targeted with targeted metabolomics meaning that both types of data evaluation are often performed on the same dataset. Following this assumption mzRAPP can utilize user-provided information on a set of molecules (the more molecules the better) with known retention behavior. mzRAPP extracts and validates chromatographic peaks for which boundaries are provided for all (enviPat predicted) isotopologues of those target molecules directly from mzML files. The resulting benchmark dataset is used to extract different performance metrics for NPP performed on the same mzML files. An overview of mzRAPPs capabilities is given in this < 3 min youtube video, which was formerly recorded for the Metabolomics2020 conference.
First, install the most recent version of
In the case of Rtools make sure you also follow the subsequent
instructions described on the webpage.
Afterward, it is time to install mzRAPP. It is possible to do that by pasting the following code into the console of R Studio and press enter. Sometimes you are asked if you want to update packages before installing mzRAPP. In that case, select the option “None” since this often leads to problems with the installation. If you want to update the packages just do so before or after installing mzRAPP.
if("devtools" %in% rownames(installed.packages()) == FALSE) {install.packages("devtools")}
Sys.setenv(R_REMOTES_NO_ERRORS_FROM_WARNINGS="true") #this is necessary because of a problem with the mzR package
devtools::install_github("YasinEl/mzRAPP", dependencies = TRUE, build_vignettes = TRUE)
Afterwards you can run mzRAPP using:
library(mzRAPP)
callmzRAPP()
or use mzRAPP without the shiny interface as described below.
If you want to go through some examples on how to use mzRAPP and interpret results you can go through some use-cases using the code below. However, more general explanations are provided in the chapter ‘Generation and interpretation of NPP performance metrics’ of this readme.
library(mzRAPP)
vignette("Vignette_mzRAPP_Example_workflow")
Please cite the following publication if you use mzRAPP in your
workflow:
Yasin El Abiead, Maximilian Milford, Reza M Salek, Gunda
Koellensperger, mzRAPP: a tool for reliability assessment of data
pre-processing in non-targeted metabolomics, Bioinformatics, 2021;,
btab231, https://doi.org/10.1093/bioinformatics/btab231
To get started mzRAPP the user has to provide list of molecules with known molecular composition and retention time (RT) boundaries. mzRAPP generates extracted ion chromatograms for all isotopologues predictable for those molecules and applies provided RT boundaries. Only isotopologues for which the theoretically most abundant and at least one additional isotopologue are found are considered for the final benchmark. Isotopologue peaks with an area or height which is more than 30% off the predicted value or a Pearson Correlation coef < 0.85 (as compared to the highest isotopologue) are removed. Since start- and end-time has to be provided for each compound it is advisable to set those boundaries using a tool for manual peak curation from which peak boundaries can be exported. One example of this would be Skyline. Boundaries can be provided per compound or file and compound as described below.
In order to generate a benchmark you need to provide your centroided mzML files. Conversion of files of different vendors to mzML as well as centroiding can be done by Proteowizards MSconvert.
This csv file (click here for an example) should contain two columns:
sample_name: names of all mzML files from which peaks should be
extracted (with our without file extension (.mzML))
sample_group: group labels of the respective samples
(e.g. treated, untreated,..). If there is only one group this still
needs to be filled out.
This csv file should contain information on the target molecules/peaks (click here for an example) and include the following columns:
molecule: names of target molecules (should be unique
identifiers)
SumForm_c: Molecular composition of the
neutral molecule (e.g. C10H15N5O10P2). Please make sure there is never a
0 behind an element like behind the N in C12H8N0S2.
main_adduct: One main adduct has to be defined for each molecule
(e.g. M+H). If the main_adduct is not detected also other adducts wont
be accepted. Therefore it makes sense to select the most trusted adduct
(generally M+H or M-H) as main adduct. All adducts enabled in the
enviPat package are allowed:
library(enviPat)
#>
#>
#> Welcome to enviPat version 2.4
#> Check www.envipat.eawag.ch for an interactive online version
data(adducts)
adducts$Name
#> [1] "M+H" "M+NH4" "M+Na" "M+K"
#> [5] "M+" "M-H" "M-2H" "M-3H"
#> [9] "M+FA-H" "M+Hac-H" "M-" "M+3H"
#> [13] "M+2H+Na" "M+H+2Na" "M+3Na" "M+2H"
#> [17] "M+H+NH4" "M+H+Na" "M+H+K" "M+ACN+2H"
#> [21] "M+2Na" "M+2ACN+2H" "M+3ACN+2H" "M+CH3OH+H"
#> [25] "M+ACN+H" "M+2Na-H" "M+IsoProp+H" "M+ACN+Na"
#> [29] "M+2K-H" "M+DMSO+H" "M+2ACN+H" "M+IsoProp+Na+H"
#> [33] "2M+H" "2M+NH4" "2M+Na" "2M+3H2O+2H"
#> [37] "2M+K" "2M+ACN+H" "2M+ACN+Na" "M-H2O-H"
#> [41] "M+Na-2H" "M+Cl" "M+K-2H" "M+Br"
#> [45] "M+TFA-H" "2M-H" "2M+FA-H" "2M+Hac-H"
#> [49] "3M-H"
user.rtmin: Start time of peak (seconds). If possible mzRAPP will
narrow peak boundaries to intersect with the extracted ion chromatogram
at 5% of the maximum peak height. It is also worth noting that peaks for
which user.rtmin/user.rtmax are provided will still be rejected if the
isotopic information is not fitting.
user.rtmax: End time of
peak (seconds).
adduct_c: (optional) If this column is
added there has to be at least one row per molecule where adduct_c is
the same as the selected main_adduct. After that additional rows with
different adducts can be added. Please note that adducts which should be
screened for all molecules can be selected more easily using the
adduct-selection-box (see below). Therefore is makes only sense to use
the adduct_c column if you want some adducts to be only screened for
specific molecules.
StartTime.EIC: (optional) Start time for chromatograms extracted
for this molecule (seconds). Peaks are only detected from this time on.
If not given StartTime.EIC and EndTime.EIC are calculated from
user.rtmin and user.rtmax.
EndTime.EIC: (optional) End time
for chromatograms extracted for this molecule (seconds). Peaks are only
detected up to this time.
FileName: (optional) Name of
sample file with or without file extension. Using this allows to apply
different values (like user.rtmin/user.rtmax) for different files.
Additional columns: (optional) It is possible to add additional
columns. Those will be kept for the final benchmark dataset.
This is necessary in order to apply the correct mass resolution for any
given m/z value when isotopologues are predicted for different molecular
formulas. All instruments enabled via the enviPat package can be
selected from the envipat resolution list. For other instruments a
custom resolution list has to be uploaded as .csv file. This .csv file
has to consist of two columns:
R: Resolution value at half
height of a mass peak
m/z: m/z value for the corresponding
resolution
Resolution values for at least 10 equally distributed
m/z values are recommended.
You can select any number of additional adducts to screen for. These
adducts will be screened for each molecule if the selected main_adduct
has been found. Please note that (as for the main_adduct) adducts will
only be added if at least two isotoplogues of the respective adduct can
be detected.
In a next step a few parameters have to be set:
Lowest iso. to
be considered [%]: Lowest relative isotopologue abundance to be
considered for each molecule (>= 0.05).
Min. # of scans per
peak: Minimum number of points for a chromatographic peak to be
considered as such.
mz precision [ppm]: Maximum spread of
mass peaks in the mz dimension to be still considered part of the same
chromatogram.
mz accuracy [ppm]: Maximum difference
between the accurate mz of two ion traces to be considered to be
originating from the same ion.
Processing plan: How should
the benchmark generation be done? sequential (only using one
core; often slow but does not use much RAM) or multiprocess
(using multiple cores; faster but needs more RAM)
Benchmark generation can be started using the blue Start button. The
necessary time for the generation depends on the number of mzML files,
the number of target compounds, and of course computational resources.
Typically this process takes minutes to hours. Afterward, the generated
benchmark dataset is automatically exported to the working directory as
csv file.
An overview of different benchmark key data is provided in the “View
Benchmark” panel. The plots can be used to inspect different qualities
of the dataset. For information on the peak variables calculated please
check ?mzRAPP::find_bench_peaks
. Also, please note that a molecule not
being detected does not necessarily mean that there is no peak, but that
mzRAPP was not able to validate it. This could happen since at least two
isotopologues, fulfilling strict criteria regarding abundance, peak
shape correlation, and the number of points per peak is required for a
given molecule to be retained in the benchmark. To get a better overview
of picked peaks two csv files as well as instructions for their
application in Skyline can be exported. Those can (but do not have to)
be used to generate a mirror image of the benchmark dataset in the free
software,
Skyline.
When the benchmark is satisfactory it can be used for reliability assessment of non-targeted data pre-processing as explained in the following chapter.
#load packages
library(mzRAPP)
library(enviPat)
#load resolution data & select instrument/resolution
data("resolution_list")
res_list <- resolution_list[["OTFusion,QExactiveHF_120000@200"]]
#calculate theoretic mz values and abundances for all isotopologues at the given mass resolution using enviPat
Th_isos <- get_mz_table(targets, #table with information on target molecules as described above
instrumentRes = res_list
)
#find regions of interest (ROIs)/for theoretic isotopoplogues
rois <- get_ROIs(files = files, #vector of mzML file names (including paths)
Target.table = Th_isos,
PrecisionMZtol = 5, #mass precision of the used mass spectrometer
AccurateMZtol = 5 #mass accuracy of the used mass spectrometer
)
#Extract/evaluate peaks
PCal <- find_bench_peaks(files = files, #vector of mzML file names (including paths) as described above
Grps = grps, #table including information on group assignment of each provided mzML file as described above
CompCol_all = rois,
Min.PointsperPeak = 8, #minimum number of points expected from a given peak
max.mz.diff_ppm = 5 #mass accuracy of the used mass spectrometer
)
#save the resulting benchmark to a csv file
data.table::fwrite(PCal, file = "Peak_list.csv", row.names = FALSE)
Reliability assessment of NPP can be set up in the panel “Setup NPP
assessment”. First, the tool to be evaluated has to be set. Afterward,
the unaligned and aligned output files of the tools to be assessed have
to be selected. The way those files can be exported from different tools
is lined out in the following. In the case of all those outputs, it
should be noted that mzRAPP is heavily dependent on all isotopologues
being reported in the output feature tables. For that reason, they must
not be filtered out for mzRAPP to work properly.
XCMS (R-version):
#unaligned file:
data.table::fwrite(xcms::peaks(xcmsSet_object), "unaligned_file.csv")
#aligned file:
data.table::fwrite(xcms::peakTable(xcmsSet_object), "aligned_file.csv")
XCMS3 (R-version):
#unaligned file:
data.table::fwrite(xcms::chromPeaks(XCMSnExp_object), "unaligned_file.csv")
#aligned file:
feature_defs <- xcms::featureDefinitions(XCMSnExp_object)
feature_areas <- xcms::featureValues(XCMSnExp_object, value = "into")
df <- merge(feature_defs, feature_areas, by=0, all=TRUE)
df <- df[,!(names(df) %in% c("peakidx"))]
data.table::fwrite(df, "aligned_file.csv")
XCMS online:
When starting a run on XCMS online make
sure that retention times are always given in seconds. After processing
download all results from XCMS online via the button “Download Results”.
Afterwards extract all Results from the zipped folder.
unaligned
file: select the xcms3xset.Rda file
aligned file: select the same
xcms3xset.Rda file
MS-DIAL:
unaligned files:
Export -> Peak list result -> [Add all files] -> [set Export
format to txt]
aligned file: When performing the alignment make
sure to activate the isotope tracking option in the alignment step (for
most cases selecting 13C and 15N as labeling elements will be adequate).
Afterwards export via: Export -> Alignment result -> [check Raw
data matrix Area] -> [set Export format to msp]
MZmine 2:
unaligned files: [select all files generated in
the chromatogram deconvolution step] -> Feature list methods ->
Export/Import -> Export to CSV file -> [set Filename including
pattern/curly brackets (e.g. blabla_{}_blabla.csv)] -> [check
“Peak name”, “Peak height”, “Peak area”, “Peak RT start”, “Peak RT end”,
“Peak RT”, “Peak m/z”, “Peak m/z min” and “Peak m/z max”] -> [set
Filter rows to ALL]
aligned file: [select file after alignment
step] -> Feature list methods -> Export/Import -> Export to
CSV file -> [additional to checks set for unaligned files check
“Export row retention time” and “Export row m/z”]
MZmine
3:
unaligned files: [select all files generated in the
chromatogram deconvolution step] -> Feature list methods ->
Export feature list -> CSV (legacy MZmine 2) -> [set Filename
including pattern/curly brackets (e.g. blabla_{}_blabla.csv)] ->
[check “Feature name”, “Peak height”, “Peak area”, “Feature RT start”,
“Feature RT end”, “Feature RT”, “Feature m/z”, “Feature m/z min” and
“Feature m/z max”] -> [set Filter rows to ALL]
aligned file:
[select file after alignment step] -> Feature list methods ->
Export feature list -> CSV (legacy MZmine 2) -> [additional to
checks set for unaligned files check “Export row retention time” and
“Export row m/z”]
El-MAVEN:
unaligned file:
[click the “Export csv” button in the “Peak Table”-panel] -> Export
all groups -> [select “Peaks Detailed Format Comma Delimited
(.csv)”]
aligned file: [click the “Export csv” button in the
“Peak Table”-panel] -> Export all groups -> [select “Groups
Summary Matrix Format Comma Delimited (.csv)”]
OpenMS:
When processing the FeatureFinderMetabo algorithm make sure to set
local_rt_range as well as local_mz_range to 0. You will have to
check ‘Show advanced parameters’ to make those parameters visible. Also
set report_covex_hulls to true.
unaligned file: [Connect a
TextExporter node with separator set to ‘,’ directly to the
FeatureFinderMetabo node] -> [connect TextExporter to Output
Folder]
aligned file: [Connect a TextExporter node with separator
set to ‘,’ directly to the FeatureLinkerUnlabeledQT node] ->
[connect TextExporter to Output Folder]
PatRoon:
patRoon is not supported directly but can still be loaded since it
allows to generate xcmsSet objects internally. Hence, it has to
be loaded into mzRAPP as “XCMS” output which has to be stated as such in
the “Setup NPP assessment tab”.
xcmsSet_object <- patRoon::getXCMSSet(patRoon_features_object)
#unaligned file:
data.table::fwrite(xcms::peaks(xcmsSet_object), "unaligned_file.csv")
#aligned file:
data.table::fwrite(xcms::peakTable(xcmsSet_object), "aligned_file.csv")
MetaboanalystR 3.0:
MetaboanalystR 3.0 can be used to
optimize XCMS parameters. It also allows subsequent data pre-processing
within the Metaboanalyst framework leading to the slightly different
output format compared to XCMS. After applying the ‘FormatPeakList’
function from MetaboanalystR the unaligned and aligned outputs for
mzRAPP can be exported as follows:
#unaligned file:
UnalignedOP <- data.table::as.data.table(FormatPeakList_output@peakpicking[["chromPeaks"]])
data.table::fwrite(UnalignedOP, "unaligned_file.csv")
#aligned file:
AlignedOP <- data.table::as.data.table(FormatPeakList_output@dataSet)
data.table::fwrite(AlignedOP, "aligned_file.csv")
SLAW:
SLAW automatically exports all necessary files.
unaligned file: SLAW-output folder ->
folder_named_like_used_algorithm (e.g. CENTWAVE) -> csv files in
folder peaktables
aligned file: SLAW-output folder ->
datamatrices -> csv-file named datamatrix_xxxxxxx
any
other tool:
It is also possible to use the outputs of other
tools. However, they have to be adapted so that they resemble the output
of one of the tools listed above. If this is not possible for some cases
but important to you please contact me at yasin.el.abiead@univie.ac.at
Next, the benchmark file has to be selected. If a benchmark has been
created during this shiny session (the benchmark is still visible in the
panel benchmark overview) the switch button “Use generated benchmark”
can be clicked as an alternative.
After performing those steps
the assessment can be started via the blue “Start assessment button”.
#select the output format of which tool you would like to read in (exportable from the different tools as described above)
#options: XCMS, XCMS3, Metaboanalyst, SLAW, El-Maven, OpenMS, MS-DIAL, MZmine 3, or MZmine 2
algo = "XCMS"
#load benchmark csv file
benchmark <-
check_benchmark_input(
file = "Path_to_benchmark_csv_file.csv",
algo = algo
)
#load non-targeted output
NPP_output <-
check_nonTargeted_input(
ug_table_path = "Path_to_unaligned_Output_csv_file.csv/.txt", #in case of multiple files (e.g. for MS-DIAL) supply a vector of paths
g_table_path = "Path_to_aligned_Output_csv_file.csv/.txt",
algo = algo,
options_table = benchmark$options_table
)
#compare benchmark with non-targeted output
comparison <-
compare_peaks(
b_table = benchmark$b_table,
ug_table = NPP_output$ug_table,
g_table = NPP_output$g_table,
algo = algo
)
#generate a list including different statistics of the comparison
comp_stat <- derive_performance_metrics(comparison)
#generate different sunburst plots for overview
plot_sunburst_peaks(comp_stat, comparison)
plot_sunburst_peakQuality(comp_stat, comparison)
plot_sunburst_alignment(comp_stat)
Before any NPP-performance metrics can be generated mzRAPP is matching the non-targeted data pre-processing (NPP) output files against the benchmark (BM). How this is done is explained in the following.
Matching of benchmark peaks with NT peaks before alignment:
Each benchmark peak (BP) is reported with the smallest and highest mz
value contributing to the chromatographic peak. To be considered as a
possible match for a BP, the mz value of an NT peak (NP) has to fall
within these two values. Matching rules considering retention time (RT)
are depicted in Figure 1a. An NP has to cover the whole core of a BP
(referred to as “peak core” in the following) while having an RT within
the borders (RTmin, RTmax) of the BP. If only a part of the peak core is
covered by an NP with its RT in the borders of the BP, the NP it is
counted as a split peak. NPs which are not overlapping with the core of
a BP is not considered. In some cases, there can be more than one match
between a BP and NPs (Figure 1d). In those cases, BPs corresponding to
the same molecule but other isotopologues (IT) are considered to choose
the NP leading to the smallest relative IT ratio (calculated via
reported peak abundances) bias as compared to the predicted IT-ratio
(Figure 1c). When more than two different IT is detected the ratio of
each IT with the highest IT reported via NPP is considered.
Matching of benchmark features with non-targeted features:
For an NT feature (NF) to be considered as a match for a benchmark
feature (BF) its reported mz and RT value have to lie between the
lowest/highest benchmark peak mzmin/mzmax and RTmin/RTmax of the
considered benchmark feature, respectively (Figure 1b). Moreover, NFs
containing NPs to which BPs have been matched before alignment are also
considered, even when those NF were reported with RT and mz values not
fitting BM information. In the case of multiple matches for the same BF,
the same strategy as for NP is applied (Figure 1c and d). However, in
the case of features, the mean area is calculated over all samples for
which a BP is present.
Matching of non-targeted peaks to non-targeted features:
To
count alignment errors (explained below) it is necessary to trace
non-targeted peaks (NP) reported before alignment in the non-targeted
features (NF) of the aligned file. This is done by matching the exact
area reported in an unaligned file on the same area in the aligned file.
This generally works since areas usually do not change during the
alignment process. In the case of multiple area-matches, the match which
is pointing towards the aligned feature with the most other matches (in
other files) is selected. If a peak can not be detected in the aligned
file but was detected in the unaligned file it is considered as being
lost during the alignment process which is reported in one of the
performance metrics (see below). This implies that when areas are
willfully changed after the peak picking step (other than adding missing
peaks via some version of a fill-gaps algorithm) the number of missed
peaks will not be reliable anymore. However, other alignment error
counts (detailed below) could still be reliable.
Reliability assessment results can be inspected in the panel “View NPP
assessment”. Using the shiny user interface, different performance
metrics are displayed at the top of the panel. All metrics are given
with an empirical confidence interval (alpha = 0.95) in percent which is
supposed to be representative for the entirety of (unknown) peaks in the
provided raw data (mzML files). It is calculated by summing up
individual metrics (which are described below) per molecule and then
bootstrapping molecules (R = 1000). Confidence intervals with values
< 0 are rounded up to 0.
The number of benchmark peaks for which a match was found among the
unaligned/aligned NPP results vs all peaks present in the benchmark (as
shown in Figure 2). For explanations on how the matching of benchmark
peaks with non-targeted peaks is performed please read the section
“Matching between BM and NPP output (background)” above.
Interpretation:
This allows to conclude the percentage of
peaks found/not found via non-targeted data pre-processing before peak
picking and after feature processing. It is worth noting that peaks not
detected during peak detection might be added via fill-gaps algorithms
or misalignment of split peaks (or other peaks). Peaks that were
detected during peak detection but not recovered after feature
processing could be explained by alignment errors (would be visible in
the number of alignment errors or peaks lost during alignment which is
detailed below).
Displayed in results as:
Number of benchmark peaks with at least one NPP match / Number of
benchmark peaks (confidence interval)
The number of split peaks that have been found for all benchmark peaks.
For a graphical explanation of a split peak please check Figure 1a. It
is worth noting that there can be more than one split peak per benchmark
peak.
Interpretation:
Split peaks account for peaks
with very poorly set integration boundaries. A high number of split
peaks can inherit problems to the alignment process as well as produce a
high number of artificial noise features.
Displayed in
results as:
Number of split peaks found over all benchmark
peaks / (Number of benchmark peaks with at least one NPP match + Number
of split peaks found over all benchmark peaks) (confidence interval)
The classification of not found peaks (Not found peaks, as defined in
Figure 2) into high and low is done for each benchmark feature
individually (as shown in Figure 3). Since we do not want to rely on the
benchmark alignment to be correct we only classify missing values if the
alignment of the benchmark is in agreement with the alignment performed
by NPP in at least one isotopologue (as also described in Figure 5).
Classification is based on the lowest benchmark peak present in the
respective feature which has been found by the non-targeted algorithm.
All benchmark peaks in this feature that have a benchmark area which is
more than 1.5 times higher than the lowest benchmark peak found via the
non-targeted approach are considered high. Otherwise, they are
considered as low. Additionally, if no peak has been found over the
whole benchmark feature all corresponding peaks will be classified as
lost peaks
Interpretation:
It is important to know
if missing peaks are generally due to their low intensities or if there
is a significant risk of missing higher peaks. This is especially true
when missing value imputation methods are used which are often based on
one of those assumptions. A high number of high missing peaks at the
peak picking step is common in features in which a majority of peaks
express a poor signal to noise ratio. After alignment, this can also be
explained by problems in the peak alignment (would be visible in the
number of alignment errors as detailed below).
Displayed in
results as:
Number of high missing values / (Number of high
missing values + Number of low missing values) (confidence interval)
Isotopologue abundance ratios (IR) are calculated relative to the
highest isotopologue of each molecule. If the relative bias of an IR
bias calculated using NPP-abundances is exceeding the tolerance
(outlined in Figure 4) it is reflected in this variable.
Interpretation:
IRs are used to make a judgment on the
quality of reported peak abundances. Since IRs can be predicted from
molecular formulas and are confirmed during the benchmark dataset
generation they can be considered as a reliable reference point for
non-targeted data pre-processing assessment. A high number of
degenerated IR at the peak detection step could be explained by problems
in setting integration boundaries or by problems in the extraction of
chromatograms (e.g., too wide or too narrow binning of mz-values). After
the feature processing step it could also be explained by alignment
errors (would be visible in the number of alignment errors as detailed
below) or problems in the fill-gaps process.
Displayed in
results as:
Number of isotopologue ratio exceeding tolerance
/ Number of isotopologue ratios which can be calculated using reported
NPP abundances (confidence interval)
We use a form of error counting which does not rely on the correct
alignment of the benchmark dataset itself. Figure 5 shows three
isotopologues (IT) of the same benchmark molecule detected in 5 samples.
The color-coding indicates the feature the peak has been assigned to by
the NPP algorithm. Whenever there is an asymmetry in the assignment of
the different ITs the number of steps necessary to reverse that
asymmetry are counted as errors. Counting benchmark divergences, on the
other hand, assumes correct alignment of the benchmark dataset. By that
nature the number of errors is a subset of all benchmark alignment
divergences. Finally lost peaks correspond to the peaks which have been
matched from the peak detection step but which are not present anymore
after the alignment step. All those counts are given in the output of
the NPP-assessment.
Interpretation:
A high number
of alignment errors can be partly due to a high number of split peaks
(discussed above) or too wide/narrow search width in the mz or RT
dimension by NPP. Lost peaks can often be attributed to NPP alignment
settings which delete features if they are not populated by enough peaks
or when peak areas are adapted in the feature processing step, in which
case the number of lost peaks becomes unreliable (however, we are
currently not aware of a NPP-tool doing that). It is worth noting that
some NPP alignment algorithms lose peaks by aligning only one peak per
time unit and file and deleting all other peaks falling into the same
time unit (e.g., xcms’ group.density()).
Displayed in
results as:
Number of one of three error-types / Number of
benchmark peaks for which an NPP-match has been found (confidence
interval)
mzRAPP is based on many other R packages. Specifically, it depends on (data.table, ggplot2, shinyjs and dplyr) and imports (shinybusy, shinydashboard, shiny, shinyWidgets, doFuture, plotly, tcltk, hutils, DT, tools, retistruct, xcms, multtest, enviPat, stats, parallel, doBy, shinycssloaders, BiocParallel, doParallel, MSnbase, DescTools, signal, intervals, future.apply, foreach, S4Vectors, V8, boot, future, bit64, htmltools, kableExtra, shinythemes and knitr ) packages from R Cran and Bioconductor. Moreover, we implemented code from two different Stackoverflow contributions (https://stackoverflow.com/questions/37169039/direct-link-to-tabitem-with-r-shiny-dashboard?rq=1 and https://stackoverflow.com/questions/57395424/how-to-format-data-for-plotly-sunburst-diagram)