Simulate mean expression levels for all genes for all samples, with between sample correlation structure simulated with eQTL effects and with the option to simulate multiple groups (i.e. cell-types).
Usage
splatPopSimulateMeans(
vcf = mockVCF(),
params = newSplatPopParams(nGenes = 1000),
verbose = TRUE,
key = NULL,
gff = NULL,
eqtl = NULL,
means = NULL,
...
)Arguments
- vcf
VariantAnnotation object containing genotypes of samples.
- params
SplatPopParams object containing parameters for population scale simulations. See
SplatPopParamsfor details.- verbose
logical. Whether to print progress messages.
- key
Either FALSE or a data.frame object containing a full or partial splatPop key.
- gff
Either NULL or a data.frame object containing a GFF/GTF file.
- eqtl
Either NULL or if simulating population parameters directly from empirical data, a data.frame with empirical/desired eQTL results. To see required format, run `mockEmpiricalSet()` and see eqtl output.
- means
Either NULL or if simulating population parameters directly from empirical data, a Matrix of real gene means across a population, where each row is a gene and each column is an individual in the population. To see required format, run `mockEmpiricalSet()` and see means output.
- ...
any additional parameter settings to override what is provided in
params.
Value
A list containing: `means` a matrix (or list of matrices if n.groups > 1) with the simulated mean gene expression value for each gene (row) and each sample (column), `key` a data.frame with population information including eQTL and group effects, and `condition` a named array containing conditional group assignments for each sample.
Details
SplatPopParams can be set in a variety of ways. 1. If
not provided, default parameters are used. 2. Default parameters can be
overridden by supplying desired parameters using setParams.
3. Parameters can be estimated from real data of your choice using
splatPopEstimate.
`splatPopSimulateMeans` involves the following steps:
Load population key or generate random or GFF/GTF based key.
Format and subset genotype data from the VCF file.
If not in key, assign expression mean and variance to each gene.
If not in key, assign eGenes-eSNPs pairs and effect sizes.
If not in key and groups >1, assign subset of eQTL associations as group-specific and assign DEG group effects.
Simulate mean gene expression matrix without eQTL effects
Quantile normalize by sample to fit single-cell expression distribution as defined in `splatEstimate`.
Add quantile normalized gene mean and cv info the eQTL key.
Add eQTL effects to means matrix.
Examples
# \donttest{
if (requireNamespace("VariantAnnotation", quietly = TRUE) &&
requireNamespace("preprocessCore", quietly = TRUE)) {
means <- splatPopSimulateMeans()
}
#> Simulating data for genes in GFF...
#> Simulating gene means for population...
# }