| Title: | Bayesian Statistical Tools for Quantitative Proteomics |
|---|---|
| Description: | Bayesian toolbox for quantitative proteomics. In particular, this package provides functions to generate synthetic datasets, execute Bayesian differential analysis methods and display results, as described in the associated article Marie Chion and Arthur Leroy (2025) <arXiv:2307.08975>. |
| Authors: | Arthur Leroy [aut, cre] (ORCID: <https://orcid.org/0000-0003-0806-8934>), Marie Chion [aut] (ORCID: <https://orcid.org/0000-0001-8956-8388>) |
| Maintainer: | Arthur Leroy <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.0 |
| Built: | 2026-05-19 09:59:55 UTC |
| Source: | https://github.com/mariechion/proteobayes |
Compute a criterion based on Credible Intervals (CI) to determine whether
the posterior t-distributions of groups should be considered different enough
to deserve further examination. Two groups are considered probably 'distinct'
if the Credible Interval of level CI_level of their respective
posterior t-distributions do not overlap.
identify_diff(posterior)identify_diff(posterior)
posterior |
A tibble, typically coming from a |
A tibble, indicating which peptides and groups seem to be different
TRUETRUE
Compute whether a given vector belongs to the multivariate t-distribution Credible Interval for given level, mean, covariance and degrees of freedom.
multi_CI(data, mean, cov, df, level = 0.95)multi_CI(data, mean, cov, df, level = 0.95)
data |
A vector, of compatible dimension with |
mean |
A vector, the mean parameter of the multivariate t-distribution |
cov |
A matrix, the covariance parameter of the multivariate t-distribution |
df |
A number, the degrees of freedom of the multivariate t-distribution |
level |
A number, between 0 and 1, corresponding to the level of the Credible Interval. Default is 0.95. |
A boolean, indicating whether the data vector belongs to the
computed Credible Interval.
TRUETRUE
Compute a multivariate inference criterion to examine whether the posterior multivariate t-distributions of groups should be considered different enough to be called 'differential'. Two groups are considered can be discriminated based on the probability weights for the element-wise means to be greater in each group. The Credible Intervals for each marginals are also provided.
multi_identify_diff( posterior, plot = TRUE, overlap_coef = TRUE, cumulative = FALSE, nb_sample = 1e+05, nb_sample_overlap = 10000 )multi_identify_diff( posterior, plot = TRUE, overlap_coef = TRUE, cumulative = FALSE, nb_sample = 1e+05, nb_sample_overlap = 10000 )
posterior |
A tibble, typically coming from a |
plot |
A boolean, indicating whether a results plot should be displayed. |
overlap_coef |
A boolean, indicating whether the overlapping coefficient between multivariate t-distributions should be computed for all groups. |
cumulative |
A boolean, indicating whether the probability distribution should be the cumulative instead. |
nb_sample |
A number of samples to draw from the empirical distributions |
nb_sample_overlap |
A number of samples to draw for the Monte Carlo approximation of the Overlapping Coefficients between all posteriors. |
A list, containing:
Diff_mean, a tibble containing the posterior mean (and their difference) for each peptide and for all one-by-one group comparisons.
Diff_proba, a tibble containing the probability distribution of the number of differential peptides for all one-by-one group comparisons.
Overlap_coef, a tibble containing the overlapping coefficient between the posterior multivariate t-distributions for all one-by-one group comparisons.
TRUETRUE
Compute a Monte Carlo approximation of the overlapping coefficient between two multivariate t-distributions with arbitrary mean, covariance and degrees of freedom.
multi_overlap_coef(mean1, mean2, cov1, cov2, df1, df2, nb_sample = 10000)multi_overlap_coef(mean1, mean2, cov1, cov2, df1, df2, nb_sample = 10000)
mean1 |
A vector, the mean parameter of a multi t-distribution |
mean2 |
A vector, the mean parameter of the other multi t-distribution |
cov1 |
A matrix, the covariance parameter of a multi t-distribution |
cov2 |
A matrix, the covariance parameter of the other multi t-distribution |
df1 |
A number, the degrees of freedom of a multi t-distribution |
df2 |
A number, the degrees of freedom of the other multi t-distribution |
nb_sample |
A number of samples drawn to compute the Monte Carlo estimation |
A number, the Monte Carlo approximation of the overlapping coefficient between the two multivariate t-distributions.
TRUETRUE
Compute the multivariate posterior distribution of the means
between multiple groups, for multiple correlated peptides. The function
accounts for multiple imputations through the Draw identifier in the
dataset.
multi_posterior_mean( data, mu_0 = NULL, lambda_0 = 1, Sigma_0 = NULL, nu_0 = 10, vectorised = FALSE )multi_posterior_mean( data, mu_0 = NULL, lambda_0 = 1, Sigma_0 = NULL, nu_0 = 10, vectorised = FALSE )
data |
A tibble or data frame containing imputed data sets for all
groups. Required columns: |
mu_0 |
A vector, corresponding to the prior mean. If NULL, all groups are initialised with the same empirical mean for each peptide. |
lambda_0 |
A number, corresponding to the prior covariance scaling parameter. |
Sigma_0 |
A matrix, corresponding to the prior covariance parameter. If NULL, the identity matrix will be used by default. |
nu_0 |
A number, corresponding to the prior degrees of freedom. |
vectorised |
A boolean, indicating whether we should used a vectorised version of the function. Default when nb_peptides < 30. If nb_peptides > 30, there is a high risk that the vectorised version would be slower. |
A tibble providing the parameters of the multivariate posterior t-distribution for the mean of the considered groups and draws for each peptide.
TRUETRUE
Compute a (high speed) quadrature approximation of the overlapping coefficient between two univariate t-distributions with arbitrary mean, covariance and degrees of freedom.
overlap_coef(mean1, mean2, var1, var2, df1, df2)overlap_coef(mean1, mean2, var1, var2, df1, df2)
mean1 |
A vector, the mean parameter of a t-distribution |
mean2 |
A vector, the mean parameter of the other t-distribution |
var1 |
A matrix, the variance parameter of a t-distribution |
var2 |
A matrix, the variance parameter of the other t-distribution |
df1 |
A number, the degrees of freedom of a t-distribution |
df2 |
A number, the degrees of freedom of the other t-distribution |
nb_sample |
A number of samples drawn to compute the Monte Carlo estimation |
A number, the Monte Carlo approximation of the overlapping coefficient between the two univariate t-distributions.
TRUETRUE
Display the posterior distribution of the difference of means between two
groups for a specific peptide. If only one group is provide, the function
display the posterior distribution of the mean for this specific group
instead. The function provides additional tools to represent information to
help inference regarding the difference between groups (reference at 0 on the
x-axis, probability of group1 > group2 and conversely).
plot_distrib( sample_distrib, group1 = NULL, group2 = NULL, peptide = NULL, prob_CI = 0.95, show_prob = TRUE, mean_bar = TRUE, index_group1 = NULL, index_group2 = NULL )plot_distrib( sample_distrib, group1 = NULL, group2 = NULL, peptide = NULL, prob_CI = 0.95, show_prob = TRUE, mean_bar = TRUE, index_group1 = NULL, index_group2 = NULL )
sample_distrib |
A data frame, typically coming from the
|
group1 |
A character string, corresponding to the name of the group
for which we plot the posterior distribution of the mean. If NULL
(default), the first group appearing in |
group2 |
A character string, corresponding to the name of the group
we want to compare to |
peptide |
A character string, corresponding to the name of the peptide
for which we plot the posterior distribution of the mean. If NULL
(default), only the first appearing in |
prob_CI |
A number, between 0 and 1, corresponding the level of the Credible Interval (CI), represented as side regions (in red) of the posterior distribution. The default value (0.95) display the 95% CI, meaning that the central region (in blue) contains 95% of the probability distribution of the mean. |
show_prob |
A boolean, indicating whether we display the label of probability comparisons between the two groups. |
mean_bar |
A boolean, indicating whether we display the vertical bar corresponding to 0 on the x-axis (when comparing two groups), of the mean value of the distribution (when displaying a unique group). |
index_group1 |
A character string, used as the index of |
index_group2 |
A character string, used as the index of |
Plot of the required posterior distribution.
TRUETRUE
Graphical representation of inference in a multivariate difference analysis context. The plotted empirical distribution represents, for each one-to-one group comparison, the probability of the number of elements for which the mean of a peptide is larger in a given group. This provides visual evidence on whether two groups are differential or not, with adequate uncertainty quantification.
plot_multi_diff(multi_diff, plot_mean = TRUE, cumulative = FALSE)plot_multi_diff(multi_diff, plot_mean = TRUE, cumulative = FALSE)
multi_diff |
A tibble, typically coming from the
|
plot_mean |
A boolean, indicating whether the graph for difference of means between all groups for each Peptide should be displayed. |
cumulative |
A boolean, indicating whether the cumulative distribution or the original probability distribution should be displayed. |
A graph (or a matrix of graphs) displaying the multivariate differential inference summary between groups
TRUETRUE
Compute the posterior distribution of the means between multiple groups. All peptides are considered independent from one another.
posterior_mean(data, mu_0 = NULL, lambda_0 = 1, beta_0 = 1, alpha_0 = 1)posterior_mean(data, mu_0 = NULL, lambda_0 = 1, beta_0 = 1, alpha_0 = 1)
data |
A tibble or data frame containing imputed data sets for all
groups. Required columns: |
mu_0 |
A vector, corresponding to the prior mean. |
lambda_0 |
A number, corresponding to the prior covariance scaling parameter. |
beta_0 |
A matrix, corresponding to the prior covariance parameter. |
alpha_0 |
A number, corresponding to the prior degrees of freedom. |
A tibble providing the empirical posterior distribution for the
TRUETRUE
Sample from a (possibly multivariate) t-distribution. This function can be used to sample both from a prior or posterior, depending on the value of parameters provided.
sample_distrib(posterior, nb_sample = 1000)sample_distrib(posterior, nb_sample = 1000)
posterior |
A tibble or data frame, detailing for each |
nb_sample |
A number, indicating the number of samples generated for
each couple |
A tibble containing the Peptide, Group and
Sample columns. The samples of each Peptide-Group
couple provide an empirical t-distribution that can be used to compute and
display differences between groups.
TRUETRUE
Simulate a complete training dataset, which may be representative of various applications. Several flexible arguments allow adjustment of the number of peptides, of groups, and samples in each experiment. The values of several parameters controlling the data generation process can be modified.
simu_db( nb_peptide = 5, nb_group = 2, nb_sample = 5, multi_imp = FALSE, nb_draw = 5, range_peptide = c(0, 50), diff_group = 3, var_sample = 2, var_draw = 1 )simu_db( nb_peptide = 5, nb_group = 2, nb_sample = 5, multi_imp = FALSE, nb_draw = 5, range_peptide = c(0, 50), diff_group = 3, var_sample = 2, var_draw = 1 )
nb_peptide |
An integer, indicating the number of peptides in the data. |
nb_group |
An integer, indicating the number of groups/conditions. |
nb_sample |
An integer, indicating the number of samples in the data for each peptide (i.e the repetitions of the same experiment). |
multi_imp |
A boolean, indicating whether multiple imputations have been applied to obtain the dataset. |
nb_draw |
A number, indicating the number of imputation procedures applied to obtain this dataset. |
range_peptide |
A 2-sized vector, indicating the range of values from which to pick a mean value for each peptide. |
diff_group |
A number, indicating the mean difference between consecutive groups |
var_sample |
A number, indicating the noise variance for each new sample of a peptide. |
var_draw |
A number, indicating the noise variance for each imputation draw. |
A full dataset of synthetic data.
## Generate a dataset with 5 peptides in each of the 2 groups, observed for ## 3 different samples data = simu_db(nb_peptide = 5, nb_group = 2, nb_sample = 3) ## Generate a dataset with 3 peptides in each of the 3 groups, observed for ## 4 different samples, for which 5 imputation draws are available. data = simu_db(nb_peptide = 3, nb_group = 3, nb_sample = 4, nb_draw = 5)## Generate a dataset with 5 peptides in each of the 2 groups, observed for ## 3 different samples data = simu_db(nb_peptide = 5, nb_group = 2, nb_sample = 3) ## Generate a dataset with 3 peptides in each of the 3 groups, observed for ## 4 different samples, for which 5 imputation draws are available. data = simu_db(nb_peptide = 3, nb_group = 3, nb_sample = 4, nb_draw = 5)
Alternative vectorised version, highly efficient when nb_peptide < 30.
vectorised_multi(data, mu_0 = NULL, lambda_0 = 1, Sigma_0 = NULL, nu_0 = 10)vectorised_multi(data, mu_0 = NULL, lambda_0 = 1, Sigma_0 = NULL, nu_0 = 10)
data |
A tibble or data frame containing imputed data sets for all
groups. Required columns: |
mu_0 |
A vector, corresponding to the prior mean. If NULL, all groups are initialised with the same empirical mean for each peptide. |
lambda_0 |
A number, corresponding to the prior covariance scaling parameter. |
Sigma_0 |
A matrix, corresponding to the prior covariance parameter. If NULL, the identity matrix will be used by default. |
nu_0 |
A number, corresponding to the prior degrees of freedom. |
A tibble providing the parameters of the posterior t-distribution for the mean of the considered groups for each peptide.
TRUETRUE