Title: | Multiple Imputation for Proteomics |
---|---|
Description: | A framework for multiple imputation for proteomics is proposed by Marie Chion, Christine Carapito and Frederic Bertrand (2021) <doi:10.1371/journal.pcbi.1010420>. It is dedicated to dealing with multiple imputation for proteomics. |
Authors: | Marie Chion [aut] , Christine Carapito [aut] , Frederic Bertrand [cre, aut] , Gordon Smyth [ctb], Davis McCarthy [ctb], Hélène Borges [ctb], Thomas Burger [ctb], Quentin Giai-Gianetto [ctb], Samuel Wieczorek [ctb] |
Maintainer: | Frederic Bertrand <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2 |
Built: | 2025-01-29 04:11:15 UTC |
Source: | https://github.com/mariechion/mi4p |
This function checks the validity of the conditions.
This function was included from the check.conditions
function in the DAPAR
package, since DAPAR
was to be removed from
Bioconductor >= 3.15.
check.conditions(conds)
check.conditions(conds)
conds |
A vector |
A list
Samuel Wieczorek as the author of
check.conditions
in the DAPAR
package.
## Not run: utils::data(Exp1_R25_pept, package='DAPARdata') check.conditions(Biobase::pData(Exp1_R25_pept)$Condition) ## End(Not run)
## Not run: utils::data(Exp1_R25_pept, package='DAPARdata') check.conditions(Biobase::pData(Exp1_R25_pept)$Condition) ## End(Not run)
This function checks the validity of the experimental design.
This function was included from the check.design
function in the DAPAR
package, since DAPAR
was to be removed from
Bioconductor >= 3.15.
check.design(sTab)
check.design(sTab)
sTab |
The data.frame which correspond to the pData function of MSnbase |
A boolean
Thomas Burger, Quentin Giai-Gianetto, Samuel Wieczorek as
the authors of check.design
in the DAPAR
package.
## Not run: utils::data(Exp1_R25_pept, package='DAPARdata') check.design(Biobase::pData(Exp1_R25_pept)[,1:3]) ## End(Not run)
## Not run: utils::data(Exp1_R25_pept, package='DAPARdata') check.design(Biobase::pData(Exp1_R25_pept)[,1:3]) ## End(Not run)
This dataset was simulated using the default values of the values of the options of the protdatasim function and the set.seed value set to 4619.
A data frame with 200 observations on the following 11 variables.
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
M. Chion, Ch. Carapito and F. Bertrand.
We simulated the data.
M. Chion, Ch. Carapito and F. Bertrand (2021). Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics. doi:10.1371/journal.pcbi.1010420.
data(datasim) str(datasim)
data(datasim) str(datasim)
Modified eBayes function to be used instead of the one in the limma package
eBayes.mod( fit, VarRubin, proportion = 0.01, stdev.coef.lim = c(0.1, 4), trend = FALSE, robust = FALSE, winsor.tail.p = c(0.05, 0.1) )
eBayes.mod( fit, VarRubin, proportion = 0.01, stdev.coef.lim = c(0.1, 4), trend = FALSE, robust = FALSE, winsor.tail.p = c(0.05, 0.1) )
fit |
an MArrayLM fitted model object produced by lmFit or contrasts.fit. For ebayes only, fit can alternatively be an unclassed list produced by lm.series, gls.series or mrlm containing components coefficients, stdev.unscaled, sigma and df.residual. |
VarRubin |
a variance-covariance matrix. |
proportion |
numeric value between 0 and 1, assumed proportion of genes which are differentially expressed |
stdev.coef.lim |
numeric vector of length 2, assumed lower and upper limits for the standard deviation of log2-fold-changes for differentially expressed genes |
trend |
logical, should an intensity-trend be allowed for the prior variance? Default is that the prior variance is constant. |
robust |
logical, should the estimation of df.prior and var.prior be robustified against outlier sample variances? |
winsor.tail.p |
numeric vector of length 1 or 2, giving left and right tail proportions of x to Winsorize. Used only when robust=TRUE. |
eBayes produces an object of class MArrayLM (see MArrayLM-class) containing everything found in fit
plus the following added components:
numeric matrix of moderated t-statistics.
numeric matrix of two-sided p-values corresponding to the t-statistics.
numeric matrix giving the log-odds of differential expression (on the natural log scale).
estimated prior value for sigma^2. A row-wise vector if covariate is non-NULL, otherwise a single value.
degrees of freedom associated with s2.prior. A row-wise vector if robust=TRUE, otherwise a single value.
row-wise numeric vector giving the total degrees of freedom associated with the t-statistics for each gene. Equal to df.prior+df.residual or sum(df.residual), whichever is smaller.
row-wise numeric vector giving the posterior values for sigma^2.
column-wise numeric vector giving estimated prior values for the variance of the log2-fold-changes for differentially expressed gene for each constrast. Used for evaluating lods.
row-wise numeric vector of moderated F-statistics for testing all contrasts defined by the columns of fit simultaneously equal to zero.
row-wise numeric vector giving p-values corresponding to F.
The matrices t, p.value and lods have the same dimensions as the input object fit, with rows corresponding to genes and columns to coefficients or contrasts. The vectors s2.prior, df.prior, df.total, F and F.p.value correspond to rows, with length equal to the number of genes. The vector var.prior corresponds to columns, with length equal to the number of contrasts. If s2.prior or df.prior have length 1, then the same value applies to all genes.
s2.prior, df.prior and var.prior contain empirical Bayes hyperparameters used to obtain df.total, s2.post and lods.
Modified by M. Chion and F. Bertrand. Original by Gordon Smyth and Davis McCarthy
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") VarRubin.matrix <- rubin2.all(datasim_imp[1:5,,], attr(datasim,"metadata")$Condition) set.seed(2016) sigma2 <- 0.05 / rchisq(100, df=10) * 10 y <- datasim_imp[,,1] design <- cbind(Intercept=1,Group=as.numeric( attr(datasim,"metadata")$Condition)-1) fit.model <- limma::lmFit(y,design) eBayes.mod(fit=fit.model,VarRubin.matrix[[1]])
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") VarRubin.matrix <- rubin2.all(datasim_imp[1:5,,], attr(datasim,"metadata")$Condition) set.seed(2016) sigma2 <- 0.05 / rchisq(100, df=10) * 10 y <- datasim_imp[,,1] design <- cbind(Intercept=1,Group=as.numeric( attr(datasim,"metadata")$Condition)-1) fit.model <- limma::lmFit(y,design) eBayes.mod(fit=fit.model,VarRubin.matrix[[1]])
It is not exported by DAPAR
and has to be reproduced here.
formatLimmaResult(fit, conds, contrast)
formatLimmaResult(fit, conds, contrast)
fit |
Limma fit |
conds |
Condition vector |
contrast |
Contrast vector |
A list of two dataframes : logFC and P_Value. The first one contains the logFC values of all the comparisons (one column for one comparison), the second one contains the pvalue of all the comparisons (one column for one comparison). The names of the columns for those two dataframes are identical and correspond to the description of the comparison.
Adapted from the code of Samuel Wieczorek in the DAPAR
package as it is an object that is not exported by the DAPAR
package.
# library(DAPAR) set.seed(2016) data(qData) data(sTab) contrast=1 sTab.old <- sTab conds <- factor(sTab$Condition, levels = unique(sTab$Condition)) sTab <- sTab[unlist(lapply(split(sTab, conds), function(x) { x["Sample.name"] })), ] qData <- qData[, unlist(lapply(split(sTab.old, conds), function(x) { x["Sample.name"] }))] conds <- conds[order(conds)] res.l <- NULL design.matrix <- mi4p::make.design(sTab) contra <- mi4p::make.contrast(design.matrix, condition = conds, contrast) cmtx <- limma::makeContrasts(contrasts = contra, levels = make.names(colnames(design.matrix))) fit <- limma::eBayes(limma::contrasts.fit(limma::lmFit(qData, design.matrix), cmtx)) res.l <- mi4p::formatLimmaResult(fit, conds, contrast)
# library(DAPAR) set.seed(2016) data(qData) data(sTab) contrast=1 sTab.old <- sTab conds <- factor(sTab$Condition, levels = unique(sTab$Condition)) sTab <- sTab[unlist(lapply(split(sTab, conds), function(x) { x["Sample.name"] })), ] qData <- qData[, unlist(lapply(split(sTab.old, conds), function(x) { x["Sample.name"] }))] conds <- conds[order(conds)] res.l <- NULL design.matrix <- mi4p::make.design(sTab) contra <- mi4p::make.contrast(design.matrix, condition = conds, contrast) cmtx <- limma::makeContrasts(contrasts = contra, levels = make.names(colnames(design.matrix))) fit <- limma::eBayes(limma::contrasts.fit(limma::lmFit(qData, design.matrix), cmtx)) res.l <- mi4p::formatLimmaResult(fit, conds, contrast)
Modified eBayes function to be used instead of the one, .ebayes
, implemented in the limma package
hid.ebayes( fit, VarRubin, mod = TRUE, proportion = 0.01, stdev.coef.lim = c(0.1, 4), trend = FALSE, robust = FALSE, winsor.tail.p = c(0.05, 0.1), legacy = TRUE )
hid.ebayes( fit, VarRubin, mod = TRUE, proportion = 0.01, stdev.coef.lim = c(0.1, 4), trend = FALSE, robust = FALSE, winsor.tail.p = c(0.05, 0.1), legacy = TRUE )
fit |
an MArrayLM fitted model object produced by lmFit or contrasts.fit. For ebayes only, fit can alternatively be an unclassed list produced by lm.series, gls.series or mrlm containing components coefficients, stdev.unscaled, sigma and df.residual. |
VarRubin |
a variance-covariance matrix. |
mod |
TRUE (not used at the moment) |
proportion |
numeric value between 0 and 1, assumed proportion of genes which are differentially expressed |
stdev.coef.lim |
numeric vector of length 2, assumed lower and upper limits for the standard deviation of log2-fold-changes for differentially expressed genes |
trend |
logical, should an intensity-trend be allowed for the prior variance? Default is that the prior variance is constant. |
robust |
logical, should the estimation of df.prior and var.prior be robustified against outlier sample variances? |
winsor.tail.p |
numeric vector of length 1 or 2, giving left and right tail proportions of x to Winsorize. Used only when robust=TRUE. |
legacy |
boolean to stick to former way to squeeze variances. Defaults to TRUE. |
eBayes produces an object of class MArrayLM (see MArrayLM-class) containing everything found in fit
plus the following added components:
numeric matrix of moderated t-statistics.
numeric matrix of two-sided p-values corresponding to the t-statistics.
numeric matrix giving the log-odds of differential expression (on the natural log scale).
estimated prior value for sigma^2. A row-wise vector if covariate is non-NULL, otherwise a single value.
degrees of freedom associated with s2.prior. A row-wise vector if robust=TRUE, otherwise a single value.
row-wise numeric vector giving the total degrees of freedom associated with the t-statistics for each gene. Equal to df.prior+df.residual or sum(df.residual), whichever is smaller.
row-wise numeric vector giving the posterior values for sigma^2.
column-wise numeric vector giving estimated prior values for the variance of the log2-fold-changes for differentially expressed gene for each constrast. Used for evaluating lods.
row-wise numeric vector of moderated F-statistics for testing all contrasts defined by the columns of fit simultaneously equal to zero.
row-wise numeric vector giving p-values corresponding to F.
The matrices t, p.value and lods have the same dimensions as the input object fit, with rows corresponding to genes and columns to coefficients or contrasts. The vectors s2.prior, df.prior, df.total, F and F.p.value correspond to rows, with length equal to the number of genes. The vector var.prior corresponds to columns, with length equal to the number of contrasts. If s2.prior or df.prior have length 1, then the same value applies to all genes.
s2.prior, df.prior and var.prior contain empirical Bayes hyperparameters used to obtain df.total, s2.post and lods.
Modified by M. Chion and F. Bertrand. Original by Gordon Smyth and Davis McCarthy
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") VarRubin.matrix <- rubin2.all(datasim_imp[1:5,,], attr(datasim,"metadata")$Condition) set.seed(2016) sigma2 <- 0.05 / rchisq(100, df=10) * 10 y <- datasim_imp[,,1] design <- cbind(Intercept=1,Group=as.numeric( attr(datasim,"metadata")$Condition)-1) fit.model <- limma::lmFit(y,design) hid.ebayes(fit=fit.model,VarRubin.matrix[[1]])
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") VarRubin.matrix <- rubin2.all(datasim_imp[1:5,,], attr(datasim,"metadata")$Condition) set.seed(2016) sigma2 <- 0.05 / rchisq(100, df=10) * 10 y <- datasim_imp[,,1] design <- cbind(Intercept=1,Group=as.numeric( attr(datasim,"metadata")$Condition)-1) fit.model <- limma::lmFit(y,design) hid.ebayes(fit=fit.model,VarRubin.matrix[[1]])
Modified version of the limmaCompleteTest
function from the DAPAR
package to return both the fit and the results.
limmaCompleteTest.mod(qData, sTab, comp.type = "OnevsOne")
limmaCompleteTest.mod(qData, sTab, comp.type = "OnevsOne")
qData |
A matrix of quantitative data, without any missing values. |
sTab |
A dataframe of experimental design ( |
comp.type |
A string that corresponds to the type of comparison. Values are: 'anova1way', 'OnevsOne' and 'OnevsAll'; default is 'OnevsOne'. |
A list of two dataframes : logFC and P_Value. The first one contains the logFC values of all the comparisons (one column for one comparison), the second one contains the pvalue of all the comparisons (one column for one comparison). The names of the columns for those two dataframes are identical and correspond to the description of the comparison.
Adapted from Hélène Borges, Thomas Burger, Quentin Giai-Gianetto, Samuel Wieczorek
set.seed(2016) data(qData) data(sTab) limma <- limmaCompleteTest.mod(qData, sTab, comp.type='OnevsOne')
set.seed(2016) data(qData) data(sTab) limma <- limmaCompleteTest.mod(qData, sTab, comp.type='OnevsOne')
This function builds the contrast matrix
make.contrast(design, condition, contrast = 1)
make.contrast(design, condition, contrast = 1)
design |
The data.frame which correspond to the pData function of MSnbase |
condition |
xxxxx |
contrast |
An integer that Indicates if the test consists of the comparison of each biological condition versus each of the other ones (Contrast=1; for example H0:"C1=C2" vs H1:"C1!=C2", etc.) or each condition versus all others (Contrast=2; e.g. H0:"C1=(C2+C3)/2" vs H1:"C1!=(C2+C3)/2", etc. if there are three conditions). |
A constrat matrix
Thomas Burger, Quentin Giai-Gianetto, Samuel Wieczorek originally in
the DAPAR
package. Included in this package since DAPAR
was to be removed from
Bioconductor >= 3.15.
## Not run: utils::data(Exp1_R25_pept, package='DAPARdata') design <- make.design(Biobase::pData(Exp1_R25_pept)) conds <- Biobase::pData(Exp1_R25_pept)$Condition make.contrast(design, conds) ## End(Not run)
## Not run: utils::data(Exp1_R25_pept, package='DAPARdata') design <- make.design(Biobase::pData(Exp1_R25_pept)) conds <- Biobase::pData(Exp1_R25_pept)$Condition make.contrast(design, conds) ## End(Not run)
This function builds the design matrix
make.design(sTab)
make.design(sTab)
sTab |
The data.frame which correspond to the pData function of MSnbase |
A design matrix
Thomas Burger, Quentin Giai-Gianetto, Samuel Wieczorek
## Not run: utils::data(Exp1_R25_pept, package='DAPARdata') make.design(Biobase::pData(Exp1_R25_pept)) ## End(Not run)
## Not run: utils::data(Exp1_R25_pept, package='DAPARdata') make.design(Biobase::pData(Exp1_R25_pept)) ## End(Not run)
This function builds the design matrix for design of level 1
make.design.1(sTab)
make.design.1(sTab)
sTab |
The data.frame which correspond to the pData function of MSnbase |
A design matrix
Thomas Burger, Quentin Giai-Gianetto, Samuel Wieczorek
## Not run: utils::data(Exp1_R25_pept, package='DAPARdata') make.design.1(Biobase::pData(Exp1_R25_pept)) ## End(Not run)
## Not run: utils::data(Exp1_R25_pept, package='DAPARdata') make.design.1(Biobase::pData(Exp1_R25_pept)) ## End(Not run)
This function builds the design matrix for design of level 2
make.design.2(sTab)
make.design.2(sTab)
sTab |
The data.frame which correspond to the pData function of MSnbase |
A design matrix
Thomas Burger, Quentin Giai-Gianetto, Samuel Wieczorek
## Not run: utils::data(Exp1_R25_pept, package='DAPARdata') make.design.2(Biobase::pData(Exp1_R25_pept)) ## End(Not run)
## Not run: utils::data(Exp1_R25_pept, package='DAPARdata') make.design.2(Biobase::pData(Exp1_R25_pept)) ## End(Not run)
This function builds the design matrix for design of level 3
make.design.3(sTab)
make.design.3(sTab)
sTab |
The data.frame which correspond to the pData function of MSnbase |
A design matrix
Thomas Burger, Quentin Giai-Gianetto, Samuel Wieczorek originally in the DAPAR package. Included in this package since DAPAR was to be removed from Bioconductor >= 3.15.
## Not run: utils::data(Exp1_R25_pept, package='DAPARdata') sTab <-cbind(Biobase::pData(Exp1_R25_pept), Tech.Rep=1:6) make.design.3(sTab) ## End(Not run)
## Not run: utils::data(Exp1_R25_pept, package='DAPARdata') sTab <-cbind(Biobase::pData(Exp1_R25_pept), Tech.Rep=1:6) make.design.3(sTab) ## End(Not run)
Computes the multiple imputation parameter estimate using the emmeans package.
meanImp_emmeans(ind, peptide = 1, tabdata, metacond)
meanImp_emmeans(ind, peptide = 1, tabdata, metacond)
ind |
index |
peptide |
name of the peptide |
tabdata |
dataset |
metacond |
a factor to specify the groups |
A vector.
M. Chion, Ch. Carapito and F. Bertrand (2021). Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics. doi:10.1371/journal.pcbi.1010420.
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") meanImp_emmeans(1,1,datasim_imp,attr(datasim,"metadata")$Condition)
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") meanImp_emmeans(1,1,datasim_imp,attr(datasim,"metadata")$Condition)
This function performs hierarchical differential analysis using a moderated t-test statistic, which accounts for multiple imputation variability if applicable.
mi4limma(qData, sTab, VarRubin, comp.type = "OnevsOne", robust = FALSE)
mi4limma(qData, sTab, VarRubin, comp.type = "OnevsOne", robust = FALSE)
qData |
A matrix of quantitative data, without any missing values. It
should be the averaged matrix from the array resulting from
|
sTab |
The experimental matrix, also corresponding to the pData function of MSnbase. |
VarRubin |
A numerical vector, resulting from |
comp.type |
A string that corresponds to the type of comparison. Values are: 'anova1way', 'OnevsOne' and 'OnevsAll'; default is 'OnevsOne'. |
robust |
logical, should the estimation of df.prior and var.prior be robustified against outlier sample variances? (as in limma's eBayes) |
A list of two dataframes : logFC and P_Value. The first one contains the logFC values of all the comparisons (one column for one comparison), the second one contains the pvalue of all the comparisons (one column for one comparison). The names of the columns for those two dataframes are identical and correspond to the description of the comparison.
Adapted by Marie Chion, from limmaCompleteTest
of the
DAPAR
package by Hélène Borges, Thomas Burger,
Quentin Giai-Gianetto and Samuel Wieczorek.
M. Chion, Ch. Carapito and F. Bertrand (2021). Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics. doi:10.1371/journal.pcbi.1010420.
set.seed(2016) data(qData) data(sTab) fit.limma <- mi4limma(qData, sTab, diag(1,2))
set.seed(2016) data(qData) data(sTab) fit.limma <- mi4limma(qData, sTab, diag(1,2))
A dataset containing the protein and petide information and peptide-level intensities for 6 samples: 3 CG and 3 mCG groups. There are 69 proteins. The columns are as follows:
data(mm_peptides)
data(mm_peptides)
A data frame with 1102 rows and 13 colummns, compiring 7 columns of metadata and 6 columns of peptide intensities. 69 proteins.
Sequence - peptide sequence - randomly chosen from a larger list of sequences
MatchedID - numeric ID that links proteins in the two datasets, unnecessary if datasets are for the same species
ProtID - protein ID, artificial protein ID, eg. Prot1, Prot2, ...
GeneID - gene ID, artificial gene ID, eg. Gene1, Gene2, ...
ProtName - artificial Protein Name
ProtIDLong - long protein ID, full protein name, here artificially simulated
GeneIDLong - long gene ID, full gene name, here artificially simulated
CG1 - raw intensity column for sample 1 in CG group
CG2 - raw intensity column for sample 2 in CG group
CG3 - raw intensity column for sample 3 in CG group
mCG1 - raw intensity column for sample 1 in mCG group
mCG2 - raw intensity column for sample 2 in mCG group
mCG3 - raw intensity column for sample 3 in mCG group
multi.impute
performs multiple imputation on a
given quantitative proteomics dataset.
multi.impute(data, conditions, nb.imp = NULL, method, parallel = FALSE)
multi.impute(data, conditions, nb.imp = NULL, method, parallel = FALSE)
data |
A quantitative matrix to be imputed, with proteins/peptides in rows and samples in columns. |
conditions |
A vector of length the number of samples where each element corresponds to the condition the sample belongs to. |
nb.imp |
The number of imputation to perform. |
method |
A single character string describing the imputation method to be used. See details. |
parallel |
Logical, whether or not use parallel computing
(with |
Multiple imputation consists in imputing several times a given
dataset using a given method. Here, imputation methods can be chosen either
from mice
, imp4p-package
or
impute.knn
:
"pmm", "midastouch", "sample", "cart", "rf","mean", "norm",
"norm.nob", "norm.boot", "norm.predict": imputation methods as described
in mice
.
"RF" imputes missing values using random forests algorithm as
described in impute.RF
.
"MLE" imputes missing values using maximum likelihood estimation
as described in impute.mle
.
"PCA" imputes missing values using principal component analysis as
described in impute.PCA
.
"SLSA" imputes missing values using structured least squares
algorithm as described in impute.slsa
.
"kNN" imputes missing values using k nearest neighbors as
described in impute.knn
.
A numeric array of dimension c(dim(data),nb.imp).
M. Chion, Ch. Carapito and F. Bertrand (2021). Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics. arxiv:2108.07086. https://arxiv.org/abs/2108.07086.
library(mi4p) data(datasim) multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE")
library(mi4p) data(datasim) multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE")
This function is designed to ampute datasets.
MVgen(dataset, prop_NA)
MVgen(dataset, prop_NA)
dataset |
dataset to be amputed |
prop_NA |
desired proportion of missing values in the amputed dataset |
A dataset with missing values.
M. Chion, Ch. Carapito and F. Bertrand (2021). Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics. doi:10.1371/journal.pcbi.1010420.
library(mi4p) data(datasim) datasim_amp <- MVgen(datasim, .2) sum(is.na(datasim_amp))/prod(dim(datasim_amp))
library(mi4p) data(datasim) datasim_amp <- MVgen(datasim, .2) sum(is.na(datasim_amp))/prod(dim(datasim_amp))
This list of 100 datasets was simulated using the default values of the options of the protdatasim function and the set.seed value set to 4619.
The format is: List of 100 data.frames.
200 obs. of 11 variables
int [1:200] 1 2 3 4 5 6 7 8 9 10 ...
num [1:200] 99.6 99.9 100.2 99.8 100.4 ...
num [1:200] 97.4 101.3 100.3 100.2 101.7 ...
num [1:200] 100.3 100.9 99.1 101.2 100.6 ...
num [1:200] 99.4 99.2 98.5 99.1 99.5 ...
num [1:200] 98.5 99.7 100 100.2 100.7 ...
num [1:200] 200 199 199 200 199 ...
num [1:200] 200 200 202 199 199 ...
num [1:200] 202 199 200 199 201 ...
num [1:200] 200 200 199 201 200 ...
num [1:200] 200 198 200 201 199 ...
'data.frame': 10 obs. of 3 variables:
chr [1:10] "X1" "X2" "X3" "X4" ...
Factor w/ 2 levels "A","B": 1 1 1 1 1 2 2 2 2 2
int [1:10] 1 2 3 4 5 6 7 8 9 10
...
M. Chion, Ch. Carapito and F. Bertrand.
We simulated the data.
M. Chion, Ch. Carapito and F. Bertrand (2021). Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics. doi:10.1371/journal.pcbi.1010420.
data(norm.200.m100.sd1.vs.m200.sd1.list) str(norm.200.m100.sd1.vs.m200.sd1.list)
data(norm.200.m100.sd1.vs.m200.sd1.list) str(norm.200.m100.sd1.vs.m200.sd1.list)
Use a projection of the given variance-covariance matrix.
proj_matrix(VarRubin.matrix, metadata)
proj_matrix(VarRubin.matrix, metadata)
VarRubin.matrix |
A variance-covariance matrix. |
metadata |
Metadata of the experiment. |
A list of variance-covariance matrices.
M. Chion, Ch. Carapito and F. Bertrand (2021). Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics. doi:10.1371/journal.pcbi.1010420.
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") VarRubin.matrix <- rubin2.all(datasim_imp[1:5,,], attr(datasim,"metadata")$Condition) proj_matrix(VarRubin.matrix, attr(datasim,"metadata"))
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") VarRubin.matrix <- rubin2.all(datasim_imp[1:5,,], attr(datasim,"metadata")$Condition) proj_matrix(VarRubin.matrix, attr(datasim,"metadata"))
Function to simulate benchmark datasets.
protdatasim( iii = 1, nobs = 200, nobs1 = 10, ng1 = 5, ng2 = 5, mg1 = 100, mg2 = 200, dispg1 = 1, dispg2 = 1 )
protdatasim( iii = 1, nobs = 200, nobs1 = 10, ng1 = 5, ng2 = 5, mg1 = 100, mg2 = 200, dispg1 = 1, dispg2 = 1 )
iii |
A parameter useful to loop over for simulated lists of datasets. It has no effect. |
nobs |
Number of peptides |
nobs1 |
Number of peptides with differential expressions between the two conditions |
ng1 |
Number of biological replicates in condition A |
ng2 |
Number of biological replicates in condition B |
mg1 |
Mean in condition A |
mg2 |
Mean in condition B |
dispg1 |
Dispersion in condition A |
dispg2 |
Dispersion in condition B |
A data frame with the simulated and attribute metadata.
M. Chion, Ch. Carapito and F. Bertrand (2021). Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics. doi:10.1371/journal.pcbi.1010420.
data_sim <- protdatasim() attr(data_sim,"metadata") norm.200.m100.sd1.vs.m200.sd1_list <- lapply(1:100, protdatasim) attr(norm.200.m100.sd1.vs.m200.sd1_list[[1]],"metadata")
data_sim <- protdatasim() attr(data_sim,"metadata") norm.200.m100.sd1.vs.m200.sd1_list <- lapply(1:100, protdatasim) attr(norm.200.m100.sd1.vs.m200.sd1_list[[1]],"metadata")
The data frame qData
contains the first 500 rows of six columns that are the quantitation of peptides for the six replicates. They were obtained using the code exprs(Exp1_R25_pept)[1:500,]
.
The format is: num [1:500, 1:6] 24.8 24.7 24.6 NA 24.5 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:500] "0" "1" "2" "3" ... ..$ : chr [1:6] "Intensity_C_R1" "Intensity_C_R2" "Intensity_C_R3" "Intensity_D_R1" ...
The DAPARdata
's Exp1_R25_pept
dataset is the final outcome of a quantitative mass spectrometry-based proteomic analysis of two samples containing different concentrations of 48 human proteins (UPS1 standard from Sigma-Aldrich) within a constant yeast background (see Giai Gianetto et al. (2016) for details). It contains the abundance values of the different human and yeast peptides identified and quantified in these two conditions. The two conditions represent the measured abundances of peptides when respectively 25 fmol and 10 fmol of UPS1 human proteins were mixed with the yeast extract before mass spectrometry analyses. This results in a concentration ratio of 2.5. Three technical replicates were acquired for each condition.
The DAPARdata
package.
Cox J., Hein M.Y., Luber C.A., Paron I., Nagaraj N., Mann M. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol Cell Proteomics. 2014 Sep, 13(9):2513-26.
Giai Gianetto, Q., Combes, F., Ramus, C., Bruley, C., Coute, Y., Burger, T. (2016). Calibration plot for proteomics: A graphical tool to visually check the assumptions underlying FDR control in quantitative experiments. Proteomics, 16(1), 29-32.
data(qData) str(qData) pairs(qData)
data(qData) str(qData) pairs(qData)
Computes the first Rubin's rule for all the peptides.
rubin1.all( data, metacond, funcmean = meanImp_emmeans, is.parallel = FALSE, verbose = FALSE )
rubin1.all( data, metacond, funcmean = meanImp_emmeans, is.parallel = FALSE, verbose = FALSE )
data |
dataset |
metacond |
a factor to specify the groups |
funcmean |
function that should be used to compute the mean |
is.parallel |
Logical, whether or not use parallel computing
(with |
verbose |
Logical, should messages be displayed? |
A vector of estimated parameters.
Frédéric Bertrand
M. Chion, Ch. Carapito and F. Bertrand (2021). Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics. doi:10.1371/journal.pcbi.1010420.
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") rubin1.all(datasim_imp[1:5,,],funcmean = meanImp_emmeans, attr(datasim,"metadata")$Condition)
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") rubin1.all(datasim_imp[1:5,,],funcmean = meanImp_emmeans, attr(datasim,"metadata")$Condition)
Computes the first Rubin's rule for a given peptide.
rubin1.one(peptide, data, funcmean = meanImp_emmeans, metacond)
rubin1.one(peptide, data, funcmean = meanImp_emmeans, metacond)
peptide |
peptide for which the variance-covariance matrix should be derived. |
data |
dataset |
funcmean |
function that should be used to compute the mean |
metacond |
a factor to specify the groups |
A vector of estimated parameters.
Frédéric Bertrand
M. Chion, Ch. Carapito and F. Bertrand (2021). Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics. arxiv:2108.07086. https://arxiv.org/abs/2108.07086.
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") rubin1.one(1,datasim_imp,funcmean = meanImp_emmeans, attr(datasim,"metadata")$Condition)
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") rubin1.one(1,datasim_imp,funcmean = meanImp_emmeans, attr(datasim,"metadata")$Condition)
Computes the total variance-covariance component in the 2nd Rubin's rule for all peptides.
rubin2.all( data, metacond, funcmean = meanImp_emmeans, funcvar = within_variance_comp_emmeans, is.parallel = FALSE, verbose = FALSE )
rubin2.all( data, metacond, funcmean = meanImp_emmeans, funcvar = within_variance_comp_emmeans, is.parallel = FALSE, verbose = FALSE )
data |
dataset |
metacond |
a factor to specify the groups |
funcmean |
function that should be used to compute the mean |
funcvar |
function that should be used to compute the variance |
is.parallel |
should parallel computing be used? |
verbose |
should messages be displayed? |
List of variance-covariance matrices.
Frédéric Bertrand
M. Chion, Ch. Carapito and F. Bertrand (2021). Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics. doi:10.1371/journal.pcbi.1010420.
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") rubin2.all(datasim_imp[1:5,,],attr(datasim,"metadata")$Condition)
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") rubin2.all(datasim_imp[1:5,,],attr(datasim,"metadata")$Condition)
Computes the between-imputation component in the 2nd Rubin's rule for all peptides.
rubin2bt.all( data, funcmean = meanImp_emmeans, metacond, is.parallel = FALSE, verbose = FALSE )
rubin2bt.all( data, funcmean = meanImp_emmeans, metacond, is.parallel = FALSE, verbose = FALSE )
data |
dataset |
funcmean |
function that should be used to compute the mean |
metacond |
a factor to specify the groups |
is.parallel |
should parallel computing be used? |
verbose |
should messages be displayed? |
List of variance-covariance matrices.
Frédéric Bertrand
M. Chion, Ch. Carapito and F. Bertrand (2021). Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics. doi:10.1371/journal.pcbi.1010420.
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") rubin2bt.all(datasim_imp[1:5,,],funcmean = meanImp_emmeans, attr(datasim,"metadata")$Condition)
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") rubin2bt.all(datasim_imp[1:5,,],funcmean = meanImp_emmeans, attr(datasim,"metadata")$Condition)
Computes the between-imputation component in the 2nd Rubin's rule for a given peptide.
rubin2bt.one(peptide, data, funcmean, metacond)
rubin2bt.one(peptide, data, funcmean, metacond)
peptide |
peptide for which the variance-covariance matrix should be derived. |
data |
dataset |
funcmean |
function that should be used to compute the mean |
metacond |
a factor to specify the groups |
A variance-covariance matrix.
Frédéric Bertrand
M. Chion, Ch. Carapito and F. Bertrand (2021). Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics. doi:10.1371/journal.pcbi.1010420.
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") rubin2bt.one(1,datasim_imp,funcmean = meanImp_emmeans, attr(datasim,"metadata")$Condition)
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") rubin2bt.one(1,datasim_imp,funcmean = meanImp_emmeans, attr(datasim,"metadata")$Condition)
Computes the within-variance component in the 2nd Rubin's rule for all peptides.
rubin2wt.all( data, funcvar = mi4p::within_variance_comp_emmeans, metacond, is.parallel = FALSE, verbose = TRUE )
rubin2wt.all( data, funcvar = mi4p::within_variance_comp_emmeans, metacond, is.parallel = FALSE, verbose = TRUE )
data |
dataset |
funcvar |
function that should be used to compute the variance |
metacond |
a factor to specify the groups |
is.parallel |
should parallel computing be used? |
verbose |
should messages be displayed? |
List of variance-covariance matrices.
Frédéric Bertrand
M. Chion, Ch. Carapito and F. Bertrand (2021). Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics. doi:10.1371/journal.pcbi.1010420.
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") rubin2wt.all(datasim_imp[1:5,,],funcvar = within_variance_comp_emmeans, attr(datasim,"metadata")$Condition)
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") rubin2wt.all(datasim_imp[1:5,,],funcvar = within_variance_comp_emmeans, attr(datasim,"metadata")$Condition)
Computes the within-variance component in the 2nd Rubin's rule for a given peptide.
rubin2wt.one(peptide, data, funcvar, metacond)
rubin2wt.one(peptide, data, funcvar, metacond)
peptide |
peptide for which the variance-covariance matrix should be derived. |
data |
dataset |
funcvar |
function that should be used to compute the variance |
metacond |
a factor to specify the groups |
A variance-covariance matrix.
Frédéric Bertrand
M. Chion, Ch. Carapito and F. Bertrand (2021). Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics. doi:10.1371/journal.pcbi.1010420.
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") rubin2wt.one(1,datasim_imp,funcvar=within_variance_comp_emmeans, attr(datasim,"metadata")$Condition)
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") rubin2wt.one(1,datasim_imp,funcvar=within_variance_comp_emmeans, attr(datasim,"metadata")$Condition)
The data frame sTab
contains the experimental design and gives few informations about the samples. They were obtained using the code pData(Exp1_R25_pept)
.
A data frame with 6 observations on the following 3 variables.
a character vector
a character vector
a numeric vector
The DAPARdata
's Exp1_R25_pept
dataset is the final outcome of a quantitative mass spectrometry-based proteomic analysis of two samples containing different concentrations of 48 human proteins (UPS1 standard from Sigma-Aldrich) within a constant yeast background (see Giai Gianetto et al. (2016) for details). It contains the abundance values of the different human and yeast peptides identified and quantified in these two conditions. The two conditions represent the measured abundances of peptides when respectively 25 fmol and 10 fmol of UPS1 human proteins were mixed with the yeast extract before mass spectrometry analyses. This results in a concentration ratio of 2.5. Three technical replicates were acquired for each condition.
The DAPARdata
package.
Cox J., Hein M.Y., Luber C.A., Paron I., Nagaraj N., Mann M. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol Cell Proteomics. 2014 Sep, 13(9):2513-26.
Giai Gianetto, Q., Combes, F., Ramus, C., Bruley, C., Coute, Y., Burger, T. (2016). Calibration plot for proteomics: A graphical tool to visually check the assumptions underlying FDR control in quantitative experiments. Proteomics, 16(1), 29-32.
data(sTab) str(sTab)
data(sTab) str(sTab)
This function check xxxxx
test.design(tab)
test.design(tab)
tab |
A data.frame which correspond to xxxxxx |
A list of two items
Thomas Burger, Quentin Giai-Gianetto, Samuel Wieczorek originally in
the DAPAR
package. Included in this package since DAPAR
was to be removed from
Bioconductor >= 3.15.
## Not run: utils::data(Exp1_R25_pept, package='DAPARdata') test.design(Biobase::pData(Exp1_R25_pept)[,1:3]) ## End(Not run)
## Not run: utils::data(Exp1_R25_pept, package='DAPARdata') test.design(Biobase::pData(Exp1_R25_pept)[,1:3]) ## End(Not run)
Computes the multiple imputation within variance component using the emmeans package.
within_variance_comp_emmeans(ind, peptide, data, metacond)
within_variance_comp_emmeans(ind, peptide, data, metacond)
ind |
index |
peptide |
name of the peptide |
data |
dataset |
metacond |
a factor to specify the groups |
A variance-covariance matrix.
Frédéric Bertrand
M. Chion, Ch. Carapito and F. Bertrand (2021). Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics. doi:10.1371/journal.pcbi.1010420.
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") within_variance_comp_emmeans(1,1,datasim_imp, attr(datasim,"metadata")$Condition)
library(mi4p) data(datasim) datasim_imp <- multi.impute(data = datasim[,-1], conditions = attr(datasim,"metadata")$Condition, method = "MLE") within_variance_comp_emmeans(1,1,datasim_imp, attr(datasim,"metadata")$Condition)