Last updated: 2024-02-27

Checks: 7 0

Knit directory: paed-inflammation-CITEseq/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20240216) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 4741d87. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Untracked files:
    Untracked:  .DS_Store
    Untracked:  analysis/05.0_remove_ambient.Rmd
    Untracked:  analysis/06.0_azimuth_annotation.Rmd
    Untracked:  analysis/06.1_azimuth_annotation_decontx.Rmd
    Untracked:  code/dropletutils.R
    Untracked:  code/utility.R
    Untracked:  data/.DS_Store
    Untracked:  data/C133_Neeland_batch0/
    Untracked:  data/C133_Neeland_batch1/
    Untracked:  data/C133_Neeland_batch2/
    Untracked:  data/C133_Neeland_batch3/
    Untracked:  data/C133_Neeland_batch4/
    Untracked:  data/C133_Neeland_batch5/
    Untracked:  data/C133_Neeland_batch6/
    Untracked:  data/CZI_samples_design_with_micro.xlsx
    Untracked:  renv.lock
    Untracked:  renv/

Unstaged changes:
    Modified:   .Rprofile
    Modified:   .gitignore
    Modified:   analysis/01.0_preprocess_batch0.Rmd
    Modified:   analysis/01.1_preprocess_batch1.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/02.0_quality_control.Rmd) and HTML (docs/02.0_quality_control.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 4741d87 Jovana Maksimovic 2024-02-27 wflow_publish("analysis/02.0_quality_control.Rmd")
html cab70ad Jovana Maksimovic 2024-02-27 Build site.
Rmd e846b43 Jovana Maksimovic 2024-02-27 wflow_publish("analysis/02.0_quality_control.Rmd")
html ab023d9 Jovana Maksimovic 2024-02-27 Build site.
Rmd 335d800 Jovana Maksimovic 2024-02-27 wflow_publish("analysis/02.0_quality_control.Rmd")

Load libraries

suppressPackageStartupMessages({
  library(BiocStyle)
  library(tidyverse)
  library(here)
  library(glue)
  library(patchwork)
  library(scran)
  library(scater)
  library(scuttle)
  library(cowplot)
})

source(here("code","utility.R"))

Load data

files <- list.files(here("data",
                         paste0("C133_Neeland_batch", 0:6),
                         "data",
                         "SCEs"),
                    pattern = "preprocessed",
                    full.names = TRUE)
               
sceLst <- sapply(files, function(fn){
  readRDS(file = fn)
})

sceLst
$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch0/data/SCEs/C133_Neeland_batch0.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 33538 34583 
metadata(1): Samples
assays(1): counts
rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
  ENSG00000268674
rowData names(3): ID Symbol Type
colnames(34583): 1_AAACCCAAGCTAGTTC-1 1_AAACCCACAAGATTGA-1 ...
  4_TTTGTTGTCTAGTACG-1 4_TTTGTTGTCTCGAACA-1
colData names(5): Barcode Capture sum detected total
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(0):

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch1/data/SCEs/C133_Neeland_batch1.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 24823 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(3): ID Symbol Type
colnames(24823): 1_AAACCCACACTTCCTG-1 1_AAACCCACAGACAAAT-1 ...
  2_TTTGTTGTCATTGGTG-1 2_TTTGTTGTCGATGGAG-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch2/data/SCEs/C133_Neeland_batch2.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 53160 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(3): ID Symbol Type
colnames(53160): 1_AAACCCAAGACCTGGA-1 1_AAACCCAAGACTGTTC-1 ...
  2_TTTGTTGTCTCATGGA-1 2_TTTGTTGTCTCCAAGA-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch3/data/SCEs/C133_Neeland_batch3.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 64842 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(3): ID Symbol Type
colnames(64842): 1_AAACCCAAGCAGCACA-1 1_AAACCCAAGCATCTTG-1 ...
  2_TTTGTTGTCTAGGCCG-1 2_TTTGTTGTCTCGGCTT-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch4/data/SCEs/C133_Neeland_batch4.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 50208 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(3): ID Symbol Type
colnames(50208): 1_AAACCCAAGCGTTAGG-1 1_AAACCCAAGGATTTGA-1 ...
  2_TTTGTTGTCGACGATT-1 2_TTTGTTGTCTAGGCCG-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch5/data/SCEs/C133_Neeland_batch5.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 50668 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(3): ID Symbol Type
colnames(50668): 1_AAACCCAAGAAGATCT-1 1_AAACCCAAGATGCAGC-1 ...
  2_TTTGTTGTCGGATTAC-1 2_TTTGTTGTCTGAGAGG-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch6/data/SCEs/C133_Neeland_batch6.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 51119 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(3): ID Symbol Type
colnames(51119): 1_AAACCCAAGAAGCGCT-1 1_AAACCCAAGACTCATC-1 ...
  2_TTTGTTGTCGAGAATA-1 2_TTTGTTGTCTACTGAG-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

Incorporating gene-based annotation

Having quantified gene expression against the Ensembl gene annotation, we have Ensembl-style identifiers for the genes. These identifiers are used as they are unambiguous and highly stable. However, they are difficult to interpret compared to the gene symbols which are more commonly used in the literature. Given the Ensembl identifiers, we obtain the corresponding gene symbols using annotation packages available through Bioconductor. Henceforth, we will use gene symbols (where available) to refer to genes in our analysis and otherwise use the Ensembl-style gene identifiers1.

sceLst <- sapply(sceLst, function(sce){
  sce <- add_gene_information(sce)
  sce
})

sceLst
$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch0/data/SCEs/C133_Neeland_batch0.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 33538 34583 
metadata(1): Samples
assays(1): counts
rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
  ENSG00000268674
rowData names(20): ID Symbol ... is_mito is_pseudogene
colnames(34583): 1_AAACCCAAGCTAGTTC-1 1_AAACCCACAAGATTGA-1 ...
  4_TTTGTTGTCTAGTACG-1 4_TTTGTTGTCTCGAACA-1
colData names(5): Barcode Capture sum detected total
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(0):

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch1/data/SCEs/C133_Neeland_batch1.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 24823 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(20): ID Symbol ... is_mito is_pseudogene
colnames(24823): 1_AAACCCACACTTCCTG-1 1_AAACCCACAGACAAAT-1 ...
  2_TTTGTTGTCATTGGTG-1 2_TTTGTTGTCGATGGAG-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch2/data/SCEs/C133_Neeland_batch2.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 53160 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(20): ID Symbol ... is_mito is_pseudogene
colnames(53160): 1_AAACCCAAGACCTGGA-1 1_AAACCCAAGACTGTTC-1 ...
  2_TTTGTTGTCTCATGGA-1 2_TTTGTTGTCTCCAAGA-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch3/data/SCEs/C133_Neeland_batch3.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 64842 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(20): ID Symbol ... is_mito is_pseudogene
colnames(64842): 1_AAACCCAAGCAGCACA-1 1_AAACCCAAGCATCTTG-1 ...
  2_TTTGTTGTCTAGGCCG-1 2_TTTGTTGTCTCGGCTT-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch4/data/SCEs/C133_Neeland_batch4.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 50208 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(20): ID Symbol ... is_mito is_pseudogene
colnames(50208): 1_AAACCCAAGCGTTAGG-1 1_AAACCCAAGGATTTGA-1 ...
  2_TTTGTTGTCGACGATT-1 2_TTTGTTGTCTAGGCCG-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch5/data/SCEs/C133_Neeland_batch5.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 50668 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(20): ID Symbol ... is_mito is_pseudogene
colnames(50668): 1_AAACCCAAGAAGATCT-1 1_AAACCCAAGATGCAGC-1 ...
  2_TTTGTTGTCGGATTAC-1 2_TTTGTTGTCTGAGAGG-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch6/data/SCEs/C133_Neeland_batch6.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 51119 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(20): ID Symbol ... is_mito is_pseudogene
colnames(51119): 1_AAACCCAAGAAGCGCT-1 1_AAACCCAAGACTCATC-1 ...
  2_TTTGTTGTCGAGAATA-1 2_TTTGTTGTCTACTGAG-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

Quality control

Define the quality control metrics

Low-quality cells need to be removed to ensure that technical effects do not distort downstream analysis results. We use several quality control (QC) metrics to measure the quality of the cells:

  • sum: This measures the library size of the cells, which is the total sum of counts across both genes and spike-in transcripts. We want cells to have high library sizes as this means more RNA has been successfully captured during library preparation.
  • detected: This is the number of expressed features2 in each cell. Cells with few expressed features are likely to be of poor quality, as the diverse transcript population has not been successful captured.
  • subsets_Mito_percent: This measures the proportion of UMIs which are mapped to mitochondrial RNA. If there is a higher than expected proportion of mitochondrial RNA this is often symptomatic of a cell which is under stress and is therefore of low quality and will not be used for the analysis.
  • subsets_Ribo_percent: This measures the proportion of UMIs which are mapped to ribosomal protein genes. If there is a higher than expected proportion of ribosomal protein gene expression this is often symptomatic of a cell which is of compromised quality and we may want to exclude it from the analysis.

In summary, we aim to identify cells with low library sizes, few expressed genes, and very high percentages of mitochondrial and ribosomal protein gene expression.

sceLst <- sapply(sceLst, function(sce){
  
  colData(sce) <- colData(sce)[, !str_detect(colnames(colData(sce)), 
                                             "sum|detected|percent|total")]
  sce <- addPerCellQC(sce, 
                      subsets = list(Mito = which(rowData(sce)$is_mito), 
                                     Ribo = which(rowData(sce)$is_ribo)))
  
  sce
})

Visualise the QC metrics

Figure @ref(fig:qcplot-by-genetic-donor) shows that the vast majority of samples are good-quality:

As we would expect, the doublet droplets have larger library sizes and more genes detected. The unassigned droplets generally have smaller library sizes and fewer genes detected.

# for batch 0 each capture is from a different donor
sceLst[[1]]$GeneticDonor <- sceLst[[1]]$Capture

p <- vector("list", length(sceLst))
for(i in 1:length(sceLst)){
  sce <- sceLst[[i]]
  
  
  p1 <- plotColData(
    sce,
    "sum",
    x = "GeneticDonor",
    other_fields = c("Capture"),
    colour_by = "GeneticDonor",
    point_size = 1) +
    scale_y_log10() +
    theme(axis.text.x = element_blank()) +
    geom_hline(yintercept = 500,
               linetype = "dotted") +
    annotation_logticks(
      sides = "l",
      short = unit(0.03, "cm"),
      mid = unit(0.06, "cm"),
      long = unit(0.09, "cm"))
  p2 <- plotColData(
    sce,
    "detected",
    x = "GeneticDonor",
    other_fields = c("Capture"),
    colour_by = "GeneticDonor",
    point_size = 1) +
    theme(axis.text.x = element_blank())
  p3 <- plotColData(
    sce,
    "subsets_Mito_percent",
    x = "GeneticDonor",
    other_fields = c("Capture"),
    colour_by = "GeneticDonor",
    point_size = 1) +
    theme(axis.text.x = element_blank())
  p4 <- plotColData(
    sce,
    "subsets_Ribo_percent",
    x = "GeneticDonor",
    other_fields = c("Capture"),
    colour_by = "GeneticDonor",
    point_size = 1) +
    theme(axis.text.x = element_blank())
  
  p[[i]] <- p1 + p2 + p3 + p4 + 
    plot_layout(guides = "collect", ncol = 2) +
    plot_annotation(title = glue("Batch {i-1}"))
}

p
[[1]]
Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

[[2]]
Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

[[3]]
Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

[[4]]
Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

[[5]]
Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

[[6]]
Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

[[7]]
Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

Identify outliers by each metric

Filtering on the mitochondrial proportion can identify stressed/damaged cells and so we seek to identify droplets with unusually large mitochondrial proportions (i.e. outliers). Outlier thresholds are defined based on the median absolute deviation (MADs) from the median value of the metric across all cells. Here, we opt to use donor-specific thresholds to account for donor-specific differences3.

The following table summarises the QC cutoffs:

# for batch 0, remove droplets with library size < 500 for consistency with other batches
sceLst[[1]] <- sceLst[[1]][, sceLst[[1]]$sum >= 500]

# identify % mito outliers
sceLst <- sapply(sceLst, function(sce){
  sce$mito_drop <- isOutlier(
    metric = sce$subsets_Mito_percent, 
    nmads = 3, 
    type = "higher",
    batch = sce$GeneticDonor,
    subset = !grepl("Unknown", sce$GeneticDonor))
  
  data.frame(
    sample = factor(
      colnames(attributes(sce$mito_drop)$thresholds),
      levels(sce$GeneticDonor)),
    lower = attributes(sce$mito_drop)$thresholds["higher", ]) %>%
    arrange(sample) %>%
    knitr::kable(caption = "Sample-specific %mito cutoffs", digits = 1) %>%
    print()
  
  sce
})
Sample-specific %mito cutoffs
sample lower
A A 19.3
B B 20.1
C C 15.0
D D 14.8
Sample-specific %mito cutoffs
sample lower
A A 6.9
B B 14.5
C C 11.4
D D 12.5
E E 12.9
F F 14.5
G G 11.1
H H 13.1
Doublet Doublet 11.7
Unknown Unknown 11.1
Sample-specific %mito cutoffs
sample lower
A A 12.8
B B 15.3
C C 15.9
D D 15.7
Doublet Doublet 14.4
Unknown Unknown 14.4
Sample-specific %mito cutoffs
sample lower
A A 13.1
B B 9.2
C C 9.6
D D 9.7
E E 9.1
F F 9.2
G G 12.6
H H 9.9
Doublet Doublet 9.3
Unknown Unknown 9.7
Sample-specific %mito cutoffs
sample lower
A A 11.4
B B 10.8
C C 8.0
D D 9.1
E E 10.5
F F 9.0
G G 11.8
Doublet Doublet 9.6
Unknown Unknown 9.6
Sample-specific %mito cutoffs
sample lower
A A 13.1
B B 14.7
C C 16.2
D D 11.4
E E 12.2
F F 15.1
G G 11.7
H H 11.9
Doublet Doublet 13.0
Unknown Unknown 12.9
Sample-specific %mito cutoffs
sample lower
A A 9.2
B B 11.3
C C 11.2
D D 10.1
Doublet Doublet 10.5
Unknown Unknown 10.3

The vast majority of cells are retained for all samples.

sceFlt <- sapply(sceLst, function(sce){
  scePre <- sce
  keep <- !sce$mito_drop
  scePre$keep <- keep
  sce <- sce[, keep]
  
  data.frame(
    ByMito = tapply(
      scePre$mito_drop, 
      scePre$GeneticDonor, 
      sum,
      na.rm = TRUE),
    Remaining = as.vector(unname(table(sce$GeneticDonor))),
    PercRemaining = round(
      100 * as.vector(unname(table(sce$GeneticDonor))) /
        as.vector(
          unname(
            table(scePre$GeneticDonor))), 1)) %>%
    tibble::rownames_to_column("GeneticDonor") %>%
    dplyr::arrange(dplyr::desc(PercRemaining)) %>%
    knitr::kable(
      caption = "Number of droplets removed by each QC step and the number of droplets remaining.") %>%
    print()
  
  sce
})
Number of droplets removed by each QC step and the number of droplets remaining.
GeneticDonor ByMito Remaining PercRemaining
D 994 9820 90.8
C 946 9129 90.6
A 462 3620 88.7
B 588 4370 88.1
Number of droplets removed by each QC step and the number of droplets remaining.
GeneticDonor ByMito Remaining PercRemaining
G 174 3093 94.7
H 164 2554 94.0
Doublet 137 1904 93.3
D 211 2846 93.1
C 172 2219 92.8
F 198 2082 91.3
A 348 3450 90.8
B 218 1615 88.1
E 300 2207 88.0
Unknown 276 655 70.4
Number of droplets removed by each QC step and the number of droplets remaining.
GeneticDonor ByMito Remaining PercRemaining
Doublet 368 8723 96.0
A 1039 15039 93.5
B 165 1962 92.2
C 422 4289 91.0
D 1665 15698 90.4
Unknown 496 3294 86.9
Number of droplets removed by each QC step and the number of droplets remaining.
GeneticDonor ByMito Remaining PercRemaining
Doublet 377 10627 96.6
C 515 11404 95.7
E 518 10324 95.2
H 319 5815 94.8
B 488 7233 93.7
D 290 3854 93.0
F 187 2365 92.7
G 398 3887 90.7
A 370 2529 87.2
Unknown 1053 2289 68.5
Number of droplets removed by each QC step and the number of droplets remaining.
GeneticDonor ByMito Remaining PercRemaining
D 460 11833 96.3
Doublet 265 6289 96.0
G 188 3911 95.4
C 325 6631 95.3
E 662 11919 94.7
A 5 66 93.0
F 258 2844 91.7
B 250 2267 90.1
Unknown 619 1416 69.6
Number of droplets removed by each QC step and the number of droplets remaining.
GeneticDonor ByMito Remaining PercRemaining
Doublet 294 6169 95.5
G 549 9728 94.7
C 441 7479 94.4
H 510 8392 94.3
D 303 2729 90.0
B 159 1124 87.6
E 700 4937 87.6
F 386 2679 87.4
A 222 1478 86.9
Unknown 613 1776 74.3
Number of droplets removed by each QC step and the number of droplets remaining.
GeneticDonor ByMito Remaining PercRemaining
Doublet 228 6388 96.6
D 924 14327 93.9
B 576 8652 93.8
A 736 9377 92.7
C 817 7741 90.5
Unknown 452 901 66.6

Of concern is whether the cells removed during QC preferentially derive from particular experimental groups. Reassuringly, Figure @ref(fig:barplot-highlighting-outliers) shows that this is not the case.

p <- lapply(1:length(sceLst), function(i){
  sce <- sceLst[[i]]
  flt <- sceFlt[[i]]
  
  sce$keep <- colnames(sce) %in% colnames(flt)
  ggcells(sce) +
    geom_bar(aes(x = GeneticDonor, fill = keep)) + 
    ylab("Number of droplets") + 
    theme_cowplot(font_size = 7) + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    facet_grid(GeneticDonor ~ ., scales = "free_y")
})

p
[[1]]
Droplets removed during QC, stratified by `Sample`.

Droplets removed during QC, stratified by Sample.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

[[2]]
Droplets removed during QC, stratified by `Sample`.

Droplets removed during QC, stratified by Sample.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

[[3]]
Droplets removed during QC, stratified by `Sample`.

Droplets removed during QC, stratified by Sample.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

[[4]]
Droplets removed during QC, stratified by `Sample`.

Droplets removed during QC, stratified by Sample.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

[[5]]
Droplets removed during QC, stratified by `Sample`.

Droplets removed during QC, stratified by Sample.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

[[6]]
Droplets removed during QC, stratified by `Sample`.

Droplets removed during QC, stratified by Sample.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

[[7]]
Droplets removed during QC, stratified by `Sample`.

Droplets removed during QC, stratified by Sample.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

Finally, Figure @ref(fig:qcplot-highlighting-outliers) compares the QC metrics of the discarded and retained droplets.

p <- lapply(1:length(sceLst), function(i){
  sce <- sceLst[[i]]
  flt <- sceFlt[[i]]
  
  sce$keep <- colnames(sce) %in% colnames(flt)
  
  p1 <- plotColData(
    sce,
    "sum",
    x = "GeneticDonor",
    colour_by = "keep",
    point_size = 0.5) +
    scale_y_log10() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    annotation_logticks(
      sides = "l",
      short = unit(0.03, "cm"),
      mid = unit(0.06, "cm"),
      long = unit(0.09, "cm"))
  p2 <- plotColData(
    sce,
    "detected",
    x = "GeneticDonor",
    colour_by = "keep",
    point_size = 0.5) +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
  p3 <- plotColData(
    sce,
    "subsets_Mito_percent",
    x = "GeneticDonor",
    colour_by = "keep",
    point_size = 0.5) +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
  p4 <- plotColData(
    sce,
    "subsets_Ribo_percent",
    x = "GeneticDonor",
    colour_by = "keep",
    point_size = 0.5) +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
  p1 + p2 + p3 + p4 + plot_layout(guides = "collect")
})

p
[[1]]
Distribution of QC metrics for each plate in the dataset. Each point represents a cell and is colored according to whether it was discarded during the QC process. Note that a cell will only be kept if it passes the relevant threshold for all QC metrics.

Distribution of QC metrics for each plate in the dataset. Each point represents a cell and is colored according to whether it was discarded during the QC process. Note that a cell will only be kept if it passes the relevant threshold for all QC metrics.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

[[2]]
Distribution of QC metrics for each plate in the dataset. Each point represents a cell and is colored according to whether it was discarded during the QC process. Note that a cell will only be kept if it passes the relevant threshold for all QC metrics.

Distribution of QC metrics for each plate in the dataset. Each point represents a cell and is colored according to whether it was discarded during the QC process. Note that a cell will only be kept if it passes the relevant threshold for all QC metrics.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

[[3]]
Distribution of QC metrics for each plate in the dataset. Each point represents a cell and is colored according to whether it was discarded during the QC process. Note that a cell will only be kept if it passes the relevant threshold for all QC metrics.

Distribution of QC metrics for each plate in the dataset. Each point represents a cell and is colored according to whether it was discarded during the QC process. Note that a cell will only be kept if it passes the relevant threshold for all QC metrics.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

[[4]]
Distribution of QC metrics for each plate in the dataset. Each point represents a cell and is colored according to whether it was discarded during the QC process. Note that a cell will only be kept if it passes the relevant threshold for all QC metrics.

Distribution of QC metrics for each plate in the dataset. Each point represents a cell and is colored according to whether it was discarded during the QC process. Note that a cell will only be kept if it passes the relevant threshold for all QC metrics.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

[[5]]
Distribution of QC metrics for each plate in the dataset. Each point represents a cell and is colored according to whether it was discarded during the QC process. Note that a cell will only be kept if it passes the relevant threshold for all QC metrics.

Distribution of QC metrics for each plate in the dataset. Each point represents a cell and is colored according to whether it was discarded during the QC process. Note that a cell will only be kept if it passes the relevant threshold for all QC metrics.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

[[6]]
Distribution of QC metrics for each plate in the dataset. Each point represents a cell and is colored according to whether it was discarded during the QC process. Note that a cell will only be kept if it passes the relevant threshold for all QC metrics.

Distribution of QC metrics for each plate in the dataset. Each point represents a cell and is colored according to whether it was discarded during the QC process. Note that a cell will only be kept if it passes the relevant threshold for all QC metrics.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

[[7]]
Distribution of QC metrics for each plate in the dataset. Each point represents a cell and is colored according to whether it was discarded during the QC process. Note that a cell will only be kept if it passes the relevant threshold for all QC metrics.

Distribution of QC metrics for each plate in the dataset. Each point represents a cell and is colored according to whether it was discarded during the QC process. Note that a cell will only be kept if it passes the relevant threshold for all QC metrics.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

Filter out unassigned droplets

Remove droplets that could not be assigned using genetics.

sceFlt <- sapply(sceFlt, function(sce){
  
  sce <- sce[, sce$GeneticDonor != "Unknown"]
  sce
})

QC summary

We had already removed droplets that have unusually small library sizes or number of genes detected by the process of identifying empty droplets. We have now further removed droplets whose mitochondrial proportions we deem to be an outlier.

To conclude, Figure @ref(fig:qcplot-post-outlier-removal) shows that following QC that most samples have similar QC metrics, as is to be expected, and Figure@ref(fig:experiment-by-donor-postqc) summarises the experimental design following QC.

p <- lapply(sceFlt, function(sce){
  p1 <- plotColData(
    sce,
    "sum",
    x = "GeneticDonor",
    other_fields = c("Capture", "GeneticDonor"),
    colour_by = "GeneticDonor",
    point_size = 0.5) +
    scale_y_log10() +
    theme(axis.text.x = element_blank()) +
    annotation_logticks(
      sides = "l",
      short = unit(0.03, "cm"),
      mid = unit(0.06, "cm"),
      long = unit(0.09, "cm"))
  p2 <- plotColData(
    sce,
    "detected",
    x = "GeneticDonor",
    other_fields = c("Capture", "GeneticDonor"),
    colour_by = "GeneticDonor",
    point_size = 0.5) +
    theme(axis.text.x = element_blank())
  p3 <- plotColData(
    sce,
    "subsets_Mito_percent",
    x = "GeneticDonor",
    other_fields = c("Capture", "GeneticDonor"),
    colour_by = "GeneticDonor",
    point_size = 0.5) +
    theme(axis.text.x = element_blank())
  p4 <- plotColData(
    sce,
    "subsets_Ribo_percent",
    x = "GeneticDonor",
    other_fields = c("Capture", "GeneticDonor"),
    colour_by = "GeneticDonor",
    point_size = 0.5) +
    theme(axis.text.x = element_blank())
  p1 + p2 + p3 + p4 + plot_layout(guides = "collect", ncol = 2)
})

p
$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch0/data/SCEs/C133_Neeland_batch0.preprocessed.SCE.rds`
Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch1/data/SCEs/C133_Neeland_batch1.preprocessed.SCE.rds`
Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch2/data/SCEs/C133_Neeland_batch2.preprocessed.SCE.rds`
Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch3/data/SCEs/C133_Neeland_batch3.preprocessed.SCE.rds`
Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch4/data/SCEs/C133_Neeland_batch4.preprocessed.SCE.rds`
Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch5/data/SCEs/C133_Neeland_batch5.preprocessed.SCE.rds`
Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch6/data/SCEs/C133_Neeland_batch6.preprocessed.SCE.rds`
Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

Update batch0 object to include dmmHTO column to align with other batches.

batch <- grepl("batch0", names(sceFlt))
sceFlt[batch][[1]]$dmmHTO <- sceFlt[batch][[1]]$Capture
p <- lapply(sceFlt, function(sce){
  p1 <- ggcells(sce) + 
    geom_bar(
      aes(x = GeneticDonor, fill = dmmHTO),
      position = position_fill(reverse = TRUE)) +
    coord_flip() +
    ylab("Frequency") +
    theme_cowplot(font_size = 10) 
  p2 <- ggcells(sce) + 
    geom_bar(
      aes(x = GeneticDonor, fill = Capture),
      position = position_fill(reverse = TRUE)) +
    coord_flip() +
    ylab("Frequency") +
    theme_cowplot(font_size = 10)
  p3 <- ggcells(sce) + 
    geom_bar(aes(x = GeneticDonor, fill = GeneticDonor)) + 
    coord_flip() + 
    ylab("Number of droplets") + 
    theme_cowplot(font_size = 10) + 
    geom_text(stat='count', aes(x = GeneticDonor, label=..count..), hjust=1.5, size=2) +
    guides(fill = FALSE)
  p1 / p2 / p3 + plot_layout(guides = "collect")
})

p
$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch0/data/SCEs/C133_Neeland_batch0.preprocessed.SCE.rds`
Breakdown of the samples following QC.

Breakdown of the samples following QC.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch1/data/SCEs/C133_Neeland_batch1.preprocessed.SCE.rds`
Breakdown of the samples following QC.

Breakdown of the samples following QC.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch2/data/SCEs/C133_Neeland_batch2.preprocessed.SCE.rds`
Breakdown of the samples following QC.

Breakdown of the samples following QC.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch3/data/SCEs/C133_Neeland_batch3.preprocessed.SCE.rds`
Breakdown of the samples following QC.

Breakdown of the samples following QC.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch4/data/SCEs/C133_Neeland_batch4.preprocessed.SCE.rds`
Breakdown of the samples following QC.

Breakdown of the samples following QC.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch5/data/SCEs/C133_Neeland_batch5.preprocessed.SCE.rds`
Breakdown of the samples following QC.

Breakdown of the samples following QC.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch6/data/SCEs/C133_Neeland_batch6.preprocessed.SCE.rds`
Breakdown of the samples following QC.

Breakdown of the samples following QC.

Version Author Date
ab023d9 Jovana Maksimovic 2024-02-27

Save data

batches <- str_extract(names(sceFlt), "batch[0-6]")

sapply(1:length(sceFlt), function(i){
  out <- here("data",
              paste0("C133_Neeland_", batches[i]),
              "data", 
              "SCEs", 
              glue("C133_Neeland_{batches[i]}.quality_filtered.SCE.rds"))
  if(!file.exists(out)) saveRDS(sceFlt[[i]], out)
  fs::file_chmod(out, "664")
  if(any(str_detect(fs::group_ids()$group_name, 
                    "oshlack_lab"))) fs::file_chown(out, 
                                                    group_id = "oshlack_lab")
})
[[1]]
NULL

[[2]]
NULL

[[3]]
NULL

[[4]]
NULL

[[5]]
NULL

[[6]]
NULL

[[7]]
NULL

Session info


sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.3

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Australia/Melbourne
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices datasets  utils     methods  
[8] base     

other attached packages:
 [1] msigdbr_7.5.1                          
 [2] Homo.sapiens_1.3.1                     
 [3] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [4] org.Hs.eg.db_3.18.0                    
 [5] GO.db_3.18.0                           
 [6] OrganismDbi_1.44.0                     
 [7] EnsDb.Hsapiens.v86_2.99.0              
 [8] ensembldb_2.26.0                       
 [9] AnnotationFilter_1.26.0                
[10] GenomicFeatures_1.54.3                 
[11] AnnotationDbi_1.64.1                   
[12] cowplot_1.1.3                          
[13] scater_1.30.1                          
[14] scran_1.30.2                           
[15] scuttle_1.12.0                         
[16] SingleCellExperiment_1.24.0            
[17] SummarizedExperiment_1.32.0            
[18] Biobase_2.62.0                         
[19] GenomicRanges_1.54.1                   
[20] GenomeInfoDb_1.38.6                    
[21] IRanges_2.36.0                         
[22] S4Vectors_0.40.2                       
[23] BiocGenerics_0.48.1                    
[24] MatrixGenerics_1.14.0                  
[25] matrixStats_1.2.0                      
[26] patchwork_1.2.0                        
[27] glue_1.7.0                             
[28] here_1.0.1                             
[29] lubridate_1.9.3                        
[30] forcats_1.0.0                          
[31] stringr_1.5.1                          
[32] dplyr_1.1.4                            
[33] purrr_1.0.2                            
[34] readr_2.1.5                            
[35] tidyr_1.3.1                            
[36] tibble_3.2.1                           
[37] ggplot2_3.4.4                          
[38] tidyverse_2.0.0                        
[39] BiocStyle_2.30.0                       
[40] workflowr_1.7.1                        

loaded via a namespace (and not attached):
  [1] later_1.3.2               BiocIO_1.12.0            
  [3] bitops_1.0-7              filelock_1.0.3           
  [5] graph_1.80.0              XML_3.99-0.16.1          
  [7] lifecycle_1.0.4           edgeR_4.0.15             
  [9] rprojroot_2.0.4           processx_3.8.3           
 [11] lattice_0.22-5            magrittr_2.0.3           
 [13] limma_3.58.1              sass_0.4.8               
 [15] rmarkdown_2.25            jquerylib_0.1.4          
 [17] yaml_2.3.8                metapod_1.10.1           
 [19] httpuv_1.6.14             DBI_1.2.1                
 [21] abind_1.4-5               zlibbioc_1.48.0          
 [23] RCurl_1.98-1.14           rappdirs_0.3.3           
 [25] git2r_0.33.0              GenomeInfoDbData_1.2.11  
 [27] ggrepel_0.9.5             irlba_2.3.5.1            
 [29] dqrng_0.3.2               DelayedMatrixStats_1.24.0
 [31] codetools_0.2-19          DelayedArray_0.28.0      
 [33] xml2_1.3.6                tidyselect_1.2.0         
 [35] farver_2.1.1              ScaledMatrix_1.10.0      
 [37] viridis_0.6.5             BiocFileCache_2.10.1     
 [39] GenomicAlignments_1.38.2  jsonlite_1.8.8           
 [41] BiocNeighbors_1.20.2      tools_4.3.2              
 [43] progress_1.2.3            Rcpp_1.0.12              
 [45] gridExtra_2.3             SparseArray_1.2.4        
 [47] xfun_0.42                 withr_3.0.0              
 [49] BiocManager_1.30.22       fastmap_1.1.1            
 [51] bluster_1.12.0            fansi_1.0.6              
 [53] callr_3.7.3               digest_0.6.34            
 [55] rsvd_1.0.5                timechange_0.3.0         
 [57] R6_2.5.1                  colorspace_2.1-0         
 [59] biomaRt_2.58.2            RSQLite_2.3.5            
 [61] utf8_1.2.4                generics_0.1.3           
 [63] renv_1.0.3                rtracklayer_1.62.0       
 [65] prettyunits_1.2.0         httr_1.4.7               
 [67] S4Arrays_1.2.0            whisker_0.4.1            
 [69] pkgconfig_2.0.3           gtable_0.3.4             
 [71] blob_1.2.4                XVector_0.42.0           
 [73] htmltools_0.5.7           RBGL_1.78.0              
 [75] ProtGenerics_1.34.0       scales_1.3.0             
 [77] png_0.1-8                 knitr_1.45               
 [79] rstudioapi_0.15.0         tzdb_0.4.0               
 [81] rjson_0.2.21              curl_5.2.0               
 [83] cachem_1.0.8              parallel_4.3.2           
 [85] vipor_0.4.7               restfulr_0.0.15          
 [87] pillar_1.9.0              grid_4.3.2               
 [89] vctrs_0.6.5               promises_1.2.1           
 [91] BiocSingular_1.18.0       dbplyr_2.4.0             
 [93] beachmat_2.18.1           cluster_2.1.6            
 [95] beeswarm_0.4.0            evaluate_0.23            
 [97] cli_3.6.2                 locfit_1.5-9.8           
 [99] compiler_4.3.2            Rsamtools_2.18.0         
[101] rlang_1.1.3               crayon_1.5.2             
[103] labeling_0.4.3            ps_1.7.6                 
[105] getPass_0.2-4             fs_1.6.3                 
[107] ggbeeswarm_0.7.2          stringi_1.8.3            
[109] viridisLite_0.4.2         BiocParallel_1.36.0      
[111] babelgene_22.9            munsell_0.5.0            
[113] Biostrings_2.70.2         lazyeval_0.2.2           
[115] Matrix_1.6-5              hms_1.1.3                
[117] sparseMatrixStats_1.14.0  bit64_4.0.5              
[119] KEGGREST_1.42.0           statmod_1.5.0            
[121] highr_0.10                igraph_2.0.1.1           
[123] memoise_2.0.1             bslib_0.6.1              
[125] bit_4.0.5                

  1. Some care is taken to account for missing and duplicate gene symbols; missing symbols are replaced with the Ensembl identifier and duplicated symbols are concatenated with the (unique) Ensembl identifiers.↩︎

  2. The number of expressed features refers to the number of genes which have non-zero counts (i.e. they have been identified in the cell at least once)↩︎

  3. It is important to note that we only using droplets assigned to a sample (i.e. we ignore unassigned droplets) for the calculation of these thresholds.↩︎