Title: | Bayesian Statistical Tools for Quantitative Proteomics |
---|---|
Description: | Bayesian toolbox for quantitative proteomics. In particular, this package provides functions to generate synthetic datasets, execute Bayesian differential analysis methods and display results, as described in the associated article Marie Chion and Arthur Leroy (2023) <arXiv:2307.08975>. |
Authors: | Arthur Leroy [aut, cre] , Marie Chion [aut] |
Maintainer: | Arthur Leroy <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.0 |
Built: | 2024-11-21 23:34:59 UTC |
Source: | https://github.com/mariechion/proteobayes |
Compute a criterion based on Credible Intervals (CI) to determine whether
the posterior t-distributions of groups should be considered different enough
to deserve further examination. Two groups are considered probably 'distinct'
if the Credible Interval of level CI_level
of their respective
posterior t-distributions do not overlap.
identify_diff(posterior, CI_level = 0.05, nb_samples = 1000)
identify_diff(posterior, CI_level = 0.05, nb_samples = 1000)
posterior |
A tibble, typically coming from a |
CI_level |
A number, defining the order of quantile chosen to assess differences between groups. |
nb_samples |
A number (optional), indicating the
number of samples to draw from the posteriors for computing mean and
credible intervals . Only used if |
A tibble, indicating which peptides and groups seem to be different
TRUE
TRUE
Compute the multivariate posterior distribution of the means
between multiple groups, for multiple correlated peptides. The function
accounts for multiple imputations through the Draw
identifier in the
dataset.
multi_posterior_mean( data, mu_0 = NULL, lambda_0 = 1, Sigma_0 = NULL, nu_0 = 10, vectorised = FALSE )
multi_posterior_mean( data, mu_0 = NULL, lambda_0 = 1, Sigma_0 = NULL, nu_0 = 10, vectorised = FALSE )
data |
A tibble or data frame containing imputed data sets for all
groups. Required columns: |
mu_0 |
A vector, corresponding to the prior mean. If NULL, all groups are initialised with the same empirical mean for each peptide. |
lambda_0 |
A number, corresponding to the prior covariance scaling parameter. |
Sigma_0 |
A matrix, corresponding to the prior covariance parameter. If NULL, the identity matrix will be used by default. |
nu_0 |
A number, corresponding to the prior degrees of freedom. |
vectorised |
A boolean, indicating whether we should used a vectorised version of the function. Default when nb_peptides < 30. If nb_peptides > 30, there is a high risk that the vectorised version would be slower. |
A tibble providing the parameters of the multivariate posterior t-distribution for the mean of the considered groups and draws for each peptide.
TRUE
TRUE
Display the posterior distribution of the difference of means between two
groups for a specific peptide. If only one group is provide, the function
display the posterior distribution of the mean for this specific group
instead. The function provides additional tools to represent information to
help inference regarding the difference between groups (reference at 0 on the
x-axis, probability of group1
> group2
and conversely).
plot_distrib( sample_distrib, group1 = NULL, group2 = NULL, peptide = NULL, prob_CI = 0.95, show_prob = TRUE, mean_bar = TRUE, index_group1 = NULL, index_group2 = NULL )
plot_distrib( sample_distrib, group1 = NULL, group2 = NULL, peptide = NULL, prob_CI = 0.95, show_prob = TRUE, mean_bar = TRUE, index_group1 = NULL, index_group2 = NULL )
sample_distrib |
A data frame, typically coming from the
|
group1 |
A character string, corresponding to the name of the group
for which we plot the posterior distribution of the mean. If NULL
(default), the first group appearing in |
group2 |
A character string, corresponding to the name of the group
we want to compare to |
peptide |
A character string, corresponding to the name of the peptide
for which we plot the posterior distribution of the mean. If NULL
(default), only the first appearing in |
prob_CI |
A number, between 0 and 1, corresponding the level of the Credible Interval (CI), represented as side regions (in red) of the posterior distribution. The default value (0.95) display the 95% CI, meaning that the central region (in blue) contains 95% of the probability distribution of the mean. |
show_prob |
A boolean, indicating whether we display the label of probability comparisons between the two groups. |
mean_bar |
A boolean, indicating whether we display the vertical bar corresponding to 0 on the x-axis (when comparing two groups), of the mean value of the distribution (when displaying a unique group). |
index_group1 |
A character string, used as the index of |
index_group2 |
A character string, used as the index of |
Plot of the required posterior distribution.
TRUE
TRUE
Compute the posterior distribution of the means between multiple groups. All peptides are considered independent from one another.
posterior_mean(data, mu_0 = NULL, lambda_0 = 1, beta_0 = 1, alpha_0 = 1)
posterior_mean(data, mu_0 = NULL, lambda_0 = 1, beta_0 = 1, alpha_0 = 1)
data |
A tibble or data frame containing imputed data sets for all
groups. Required columns: |
mu_0 |
A vector, corresponding to the prior mean. |
lambda_0 |
A number, corresponding to the prior covariance scaling parameter. |
beta_0 |
A matrix, corresponding to the prior covariance parameter. |
alpha_0 |
A number, corresponding to the prior degrees of freedom. |
A tibble providing the empirical posterior distribution for the
TRUE
TRUE
Sample from a (possibly multivariate) t-distribution. This function can be used to sample both from a prior or posterior, depending on the value of parameters provided.
sample_distrib(posterior, nb_sample = 1000)
sample_distrib(posterior, nb_sample = 1000)
posterior |
A tibble or data frame, detailing for each |
nb_sample |
A number, indicating the number of samples generated for
each couple |
A tibble containing the Peptide
, Group
and
Sample
columns. The samples of each Peptide
-Group
couple provide an empirical t-distribution that can be used to compute and
display differences between groups.
TRUE
TRUE
Simulate a complete training dataset, which may be representative of various applications. Several flexible arguments allow adjustment of the number of peptides, of groups, and samples in each experiment. The values of several parameters controlling the data generation process can be modified.
simu_db( nb_peptide = 5, nb_group = 2, nb_sample = 5, multi_imp = FALSE, nb_draw = 5, range_peptide = c(0, 50), diff_group = 3, var_sample = 2, var_draw = 1 )
simu_db( nb_peptide = 5, nb_group = 2, nb_sample = 5, multi_imp = FALSE, nb_draw = 5, range_peptide = c(0, 50), diff_group = 3, var_sample = 2, var_draw = 1 )
nb_peptide |
An integer, indicating the number of peptides in the data. |
nb_group |
An integer, indicating the number of groups/conditions. |
nb_sample |
An integer, indicating the number of samples in the data for each peptide (i.e the repetitions of the same experiment). |
multi_imp |
A boolean, indicating whether multiple imputations have been applied to obtain the dataset. |
nb_draw |
A number, indicating the number of imputation procedures applied to obtain this dataset. |
range_peptide |
A 2-sized vector, indicating the range of values from which to pick a mean value for each peptide. |
diff_group |
A number, indicating the mean difference between consecutive groups |
var_sample |
A number, indicating the noise variance for each new sample of a peptide. |
var_draw |
A number, indicating the noise variance for each imputation draw. |
A full dataset of synthetic data.
## Generate a dataset with 5 peptides in each of the 2 groups, observed for ## 3 different samples data = simu_db(nb_peptide = 5, nb_group = 2, nb_sample = 3) ## Generate a dataset with 3 peptides in each of the 3 groups, observed for ## 4 different samples, for which 5 imputation draws are available. data = simu_db(nb_peptide = 3, nb_group = 3, nb_sample = 4, nb_draw = 5)
## Generate a dataset with 5 peptides in each of the 2 groups, observed for ## 3 different samples data = simu_db(nb_peptide = 5, nb_group = 2, nb_sample = 3) ## Generate a dataset with 3 peptides in each of the 3 groups, observed for ## 4 different samples, for which 5 imputation draws are available. data = simu_db(nb_peptide = 3, nb_group = 3, nb_sample = 4, nb_draw = 5)
Alternative vectorised version, highly efficient when nb_peptide < 30.
vectorised_multi(data, mu_0 = NULL, lambda_0 = 1, Sigma_0 = NULL, nu_0 = 10)
vectorised_multi(data, mu_0 = NULL, lambda_0 = 1, Sigma_0 = NULL, nu_0 = 10)
data |
A tibble or data frame containing imputed data sets for all
groups. Required columns: |
mu_0 |
A vector, corresponding to the prior mean. If NULL, all groups are initialised with the same empirical mean for each peptide. |
lambda_0 |
A number, corresponding to the prior covariance scaling parameter. |
Sigma_0 |
A matrix, corresponding to the prior covariance parameter. If NULL, the identity matrix will be used by default. |
nu_0 |
A number, corresponding to the prior degrees of freedom. |
A tibble providing the parameters of the posterior t-distribution for the mean of the considered groups for each peptide.
TRUE
TRUE