Simulate mean expression levels for all genes for all samples, with between sample correlation structure simulated with eQTL effects and with the option to simulate multiple groups (i.e. cell-types).
Usage
splatPopSimulateMeans(
vcf = mockVCF(),
params = newSplatPopParams(nGenes = 1000),
verbose = TRUE,
key = NULL,
gff = NULL,
eqtl = NULL,
means = NULL,
...
)
Arguments
- vcf
VariantAnnotation object containing genotypes of samples.
- params
SplatPopParams object containing parameters for population scale simulations. See
SplatPopParams
for details.- verbose
logical. Whether to print progress messages.
- key
Either FALSE or a data.frame object containing a full or partial splatPop key.
- gff
Either NULL or a data.frame object containing a GFF/GTF file.
- eqtl
Either NULL or if simulating population parameters directly from empirical data, a data.frame with empirical/desired eQTL results. To see required format, run `mockEmpiricalSet()` and see eqtl output.
- means
Either NULL or if simulating population parameters directly from empirical data, a Matrix of real gene means across a population, where each row is a gene and each column is an individual in the population. To see required format, run `mockEmpiricalSet()` and see means output.
- ...
any additional parameter settings to override what is provided in
params
.
Value
A list containing: `means` a matrix (or list of matrices if n.groups > 1) with the simulated mean gene expression value for each gene (row) and each sample (column), `key` a data.frame with population information including eQTL and group effects, and `condition` a named array containing conditional group assignments for each sample.
Details
SplatPopParams can be set in a variety of ways. 1. If
not provided, default parameters are used. 2. Default parameters can be
overridden by supplying desired parameters using setParams
.
3. Parameters can be estimated from real data of your choice using
splatPopEstimate
.
`splatPopSimulateMeans` involves the following steps:
Load population key or generate random or GFF/GTF based key.
Format and subset genotype data from the VCF file.
If not in key, assign expression mean and variance to each gene.
If not in key, assign eGenes-eSNPs pairs and effect sizes.
If not in key and groups >1, assign subset of eQTL associations as group-specific and assign DEG group effects.
Simulate mean gene expression matrix without eQTL effects
Quantile normalize by sample to fit single-cell expression distribution as defined in `splatEstimate`.
Add quantile normalized gene mean and cv info the eQTL key.
Add eQTL effects to means matrix.
Examples
# \donttest{
if (requireNamespace("VariantAnnotation", quietly = TRUE) &&
requireNamespace("preprocessCore", quietly = TRUE)) {
means <- splatPopSimulateMeans()
}
#> Simulating data for genes in GFF...
#> Simulating gene means for population...
# }