Simulate count data from a fictional single-cell RNA-seq experiment using the Splat method.
splatSimulate( params = newSplatParams(), method = c("single", "groups", "paths"), verbose = TRUE, ... ) splatSimulateSingle(params = newSplatParams(), verbose = TRUE, ...) splatSimulateGroups(params = newSplatParams(), verbose = TRUE, ...) splatSimulatePaths(params = newSplatParams(), verbose = TRUE, ...)
params | SplatParams object containing parameters for the simulation.
See |
---|---|
method | which simulation method to use. Options are "single" which produces a single population, "groups" which produces distinct groups (eg. cell types) or "paths" which selects cells from continuous trajectories (eg. differentiation processes). |
verbose | logical. Whether to print progress messages. |
... | any additional parameter settings to override what is provided in
|
SingleCellExperiment object containing the simulated counts and intermediate values.
Parameters can be set in a variety of ways. If no parameters are provided
the default parameters are used. Any parameters in params
can be
overridden by supplying additional arguments through a call to
setParams
. This design allows the user flexibility in
how they supply parameters and allows small adjustments without creating a
new SplatParams
object. See examples for a demonstration of how this
can be used.
The simulation involves the following steps:
Set up simulation object
Simulate library sizes
Simulate gene means
Simulate groups/paths
Simulate BCV adjusted cell means
Simulate true counts
Simulate dropout
Create final dataset
The final output is a
SingleCellExperiment
object that
contains the simulated counts but also the values for various intermediate
steps. These are stored in the colData
(for cell specific
information), rowData
(for gene specific information) or
assays
(for gene by cell matrices) slots. This additional
information includes:
colData
Unique cell identifier.
The group or path the cell belongs to.
The expected library size for that cell.
how far along the path each cell is.
rowData
Unique gene identifier.
The base expression level for that gene.
Expression outlier factor for that gene. Values of 1 indicate the gene is not an expression outlier.
Expression level after applying outlier factors.
The batch effects factor for each gene for a particular batch.
The differential expression factor for each gene in a particular group. Values of 1 indicate the gene is not differentially expressed.
Factor applied to genes that have non-linear changes in expression along a path.
assays
The mean expression of genes in each cell after adding batch effects.
The mean expression of genes in each cell after any differential expression and adjusted for expected library size.
The Biological Coefficient of Variation for each gene in each cell.
The mean expression level of genes in each cell adjusted for BCV.
The simulated counts before dropout.
Logical matrix showing which values have been dropped in which cells.
Values that have been added by Splatter are named using UpperCamelCase
in order to differentiate them from the values added by analysis packages
which typically use underscore_naming
.
Zappia L, Phipson B, Oshlack A. Splatter: simulation of single-cell RNA sequencing data. Genome Biology (2017).
Paper: 10.1186/s13059-017-1305-0
Code: https://github.com/Oshlack/splatter
splatSimLibSizes
, splatSimGeneMeans
,
splatSimBatchEffects
, splatSimBatchCellMeans
,
splatSimDE
, splatSimCellMeans
,
splatSimBCVMeans
, splatSimTrueCounts
,
splatSimDropout
# Simulation with default parameters sim <- splatSimulate()#>#>#>#>#>#>#>#>if (FALSE) { # Simulation with different number of genes sim <- splatSimulate(nGenes = 1000) # Simulation with custom parameters params <- newSplatParams(nGenes = 100, mean.rate = 0.5) sim <- splatSimulate(params) # Simulation with adjusted custom parameters sim <- splatSimulate(params, mean.rate = 0.6, out.prob = 0.2) # Simulate groups sim <- splatSimulate(method = "groups") # Simulate paths sim <- splatSimulate(method = "paths") }