Last updated: 2024-02-27

Checks: 7 0

Knit directory: paed-inflammation-CITEseq/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20240216)

The command set.seed(20240216) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 4741d87

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 4741d87. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Untracked files:
    Untracked:  .DS_Store
    Untracked:  analysis/05.0_remove_ambient.Rmd
    Untracked:  analysis/06.0_azimuth_annotation.Rmd
    Untracked:  analysis/06.1_azimuth_annotation_decontx.Rmd
    Untracked:  code/dropletutils.R
    Untracked:  code/utility.R
    Untracked:  data/.DS_Store
    Untracked:  data/C133_Neeland_batch0/
    Untracked:  data/C133_Neeland_batch1/
    Untracked:  data/C133_Neeland_batch2/
    Untracked:  data/C133_Neeland_batch3/
    Untracked:  data/C133_Neeland_batch4/
    Untracked:  data/C133_Neeland_batch5/
    Untracked:  data/C133_Neeland_batch6/
    Untracked:  data/CZI_samples_design_with_micro.xlsx
    Untracked:  renv.lock
    Untracked:  renv/

Unstaged changes:
    Modified:   .Rprofile
    Modified:   .gitignore
    Modified:   analysis/01.0_preprocess_batch0.Rmd
    Modified:   analysis/01.1_preprocess_batch1.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/02.0_quality_control.Rmd) and HTML (docs/02.0_quality_control.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	4741d87	Jovana Maksimovic	2024-02-27	wflow_publish("analysis/02.0_quality_control.Rmd")
html	cab70ad	Jovana Maksimovic	2024-02-27	Build site.
Rmd	e846b43	Jovana Maksimovic	2024-02-27	wflow_publish("analysis/02.0_quality_control.Rmd")
html	ab023d9	Jovana Maksimovic	2024-02-27	Build site.
Rmd	335d800	Jovana Maksimovic	2024-02-27	wflow_publish("analysis/02.0_quality_control.Rmd")

Load libraries

suppressPackageStartupMessages({
  library(BiocStyle)
  library(tidyverse)
  library(here)
  library(glue)
  library(patchwork)
  library(scran)
  library(scater)
  library(scuttle)
  library(cowplot)
})

source(here("code","utility.R"))

Load data

files <- list.files(here("data",
                         paste0("C133_Neeland_batch", 0:6),
                         "data",
                         "SCEs"),
                    pattern = "preprocessed",
                    full.names = TRUE)
               
sceLst <- sapply(files, function(fn){
  readRDS(file = fn)
})

sceLst

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch0/data/SCEs/C133_Neeland_batch0.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 33538 34583 
metadata(1): Samples
assays(1): counts
rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
  ENSG00000268674
rowData names(3): ID Symbol Type
colnames(34583): 1_AAACCCAAGCTAGTTC-1 1_AAACCCACAAGATTGA-1 ...
  4_TTTGTTGTCTAGTACG-1 4_TTTGTTGTCTCGAACA-1
colData names(5): Barcode Capture sum detected total
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(0):

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch1/data/SCEs/C133_Neeland_batch1.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 24823 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(3): ID Symbol Type
colnames(24823): 1_AAACCCACACTTCCTG-1 1_AAACCCACAGACAAAT-1 ...
  2_TTTGTTGTCATTGGTG-1 2_TTTGTTGTCGATGGAG-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch2/data/SCEs/C133_Neeland_batch2.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 53160 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(3): ID Symbol Type
colnames(53160): 1_AAACCCAAGACCTGGA-1 1_AAACCCAAGACTGTTC-1 ...
  2_TTTGTTGTCTCATGGA-1 2_TTTGTTGTCTCCAAGA-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch3/data/SCEs/C133_Neeland_batch3.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 64842 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(3): ID Symbol Type
colnames(64842): 1_AAACCCAAGCAGCACA-1 1_AAACCCAAGCATCTTG-1 ...
  2_TTTGTTGTCTAGGCCG-1 2_TTTGTTGTCTCGGCTT-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch4/data/SCEs/C133_Neeland_batch4.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 50208 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(3): ID Symbol Type
colnames(50208): 1_AAACCCAAGCGTTAGG-1 1_AAACCCAAGGATTTGA-1 ...
  2_TTTGTTGTCGACGATT-1 2_TTTGTTGTCTAGGCCG-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch5/data/SCEs/C133_Neeland_batch5.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 50668 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(3): ID Symbol Type
colnames(50668): 1_AAACCCAAGAAGATCT-1 1_AAACCCAAGATGCAGC-1 ...
  2_TTTGTTGTCGGATTAC-1 2_TTTGTTGTCTGAGAGG-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch6/data/SCEs/C133_Neeland_batch6.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 51119 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(3): ID Symbol Type
colnames(51119): 1_AAACCCAAGAAGCGCT-1 1_AAACCCAAGACTCATC-1 ...
  2_TTTGTTGTCGAGAATA-1 2_TTTGTTGTCTACTGAG-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

Incorporating gene-based annotation

Having quantified gene expression against the Ensembl gene annotation, we have Ensembl-style identifiers for the genes. These identifiers are used as they are unambiguous and highly stable. However, they are difficult to interpret compared to the gene symbols which are more commonly used in the literature. Given the Ensembl identifiers, we obtain the corresponding gene symbols using annotation packages available through Bioconductor. Henceforth, we will use gene symbols (where available) to refer to genes in our analysis and otherwise use the Ensembl-style gene identifiers¹.

sceLst <- sapply(sceLst, function(sce){
  sce <- add_gene_information(sce)
  sce
})

sceLst

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch0/data/SCEs/C133_Neeland_batch0.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 33538 34583 
metadata(1): Samples
assays(1): counts
rownames(33538): ENSG00000243485 ENSG00000237613 ... ENSG00000277475
  ENSG00000268674
rowData names(20): ID Symbol ... is_mito is_pseudogene
colnames(34583): 1_AAACCCAAGCTAGTTC-1 1_AAACCCACAAGATTGA-1 ...
  4_TTTGTTGTCTAGTACG-1 4_TTTGTTGTCTCGAACA-1
colData names(5): Barcode Capture sum detected total
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(0):

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch1/data/SCEs/C133_Neeland_batch1.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 24823 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(20): ID Symbol ... is_mito is_pseudogene
colnames(24823): 1_AAACCCACACTTCCTG-1 1_AAACCCACAGACAAAT-1 ...
  2_TTTGTTGTCATTGGTG-1 2_TTTGTTGTCGATGGAG-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch2/data/SCEs/C133_Neeland_batch2.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 53160 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(20): ID Symbol ... is_mito is_pseudogene
colnames(53160): 1_AAACCCAAGACCTGGA-1 1_AAACCCAAGACTGTTC-1 ...
  2_TTTGTTGTCTCATGGA-1 2_TTTGTTGTCTCCAAGA-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch3/data/SCEs/C133_Neeland_batch3.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 64842 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(20): ID Symbol ... is_mito is_pseudogene
colnames(64842): 1_AAACCCAAGCAGCACA-1 1_AAACCCAAGCATCTTG-1 ...
  2_TTTGTTGTCTAGGCCG-1 2_TTTGTTGTCTCGGCTT-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch4/data/SCEs/C133_Neeland_batch4.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 50208 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(20): ID Symbol ... is_mito is_pseudogene
colnames(50208): 1_AAACCCAAGCGTTAGG-1 1_AAACCCAAGGATTTGA-1 ...
  2_TTTGTTGTCGACGATT-1 2_TTTGTTGTCTAGGCCG-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch5/data/SCEs/C133_Neeland_batch5.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 50668 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(20): ID Symbol ... is_mito is_pseudogene
colnames(50668): 1_AAACCCAAGAAGATCT-1 1_AAACCCAAGATGCAGC-1 ...
  2_TTTGTTGTCGGATTAC-1 2_TTTGTTGTCTGAGAGG-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch6/data/SCEs/C133_Neeland_batch6.preprocessed.SCE.rds`
class: SingleCellExperiment 
dim: 36601 51119 
metadata(1): Samples
assays(1): counts
rownames(36601): ENSG00000243485 ENSG00000237613 ... ENSG00000278817
  ENSG00000277196
rowData names(20): ID Symbol ... is_mito is_pseudogene
colnames(51119): 1_AAACCCAAGAAGCGCT-1 1_AAACCCAAGACTCATC-1 ...
  2_TTTGTTGTCGAGAATA-1 2_TTTGTTGTCTACTGAG-1
colData names(11): Barcode Capture ... GeneticDonor vireo
reducedDimNames(0):
mainExpName: Gene Expression
altExpNames(2): HTO ADT

Quality control

Define the quality control metrics

Low-quality cells need to be removed to ensure that technical effects do not distort downstream analysis results. We use several quality control (QC) metrics to measure the quality of the cells:

sum: This measures the library size of the cells, which is the total sum of counts across both genes and spike-in transcripts. We want cells to have high library sizes as this means more RNA has been successfully captured during library preparation.
detected: This is the number of expressed features² in each cell. Cells with few expressed features are likely to be of poor quality, as the diverse transcript population has not been successful captured.
subsets_Mito_percent: This measures the proportion of UMIs which are mapped to mitochondrial RNA. If there is a higher than expected proportion of mitochondrial RNA this is often symptomatic of a cell which is under stress and is therefore of low quality and will not be used for the analysis.
subsets_Ribo_percent: This measures the proportion of UMIs which are mapped to ribosomal protein genes. If there is a higher than expected proportion of ribosomal protein gene expression this is often symptomatic of a cell which is of compromised quality and we may want to exclude it from the analysis.

In summary, we aim to identify cells with low library sizes, few expressed genes, and very high percentages of mitochondrial and ribosomal protein gene expression.

sceLst <- sapply(sceLst, function(sce){
  
  colData(sce) <- colData(sce)[, !str_detect(colnames(colData(sce)), 
                                             "sum|detected|percent|total")]
  sce <- addPerCellQC(sce, 
                      subsets = list(Mito = which(rowData(sce)$is_mito), 
                                     Ribo = which(rowData(sce)$is_ribo)))
  
  sce
})

Visualise the QC metrics

Figure @ref(fig:qcplot-by-genetic-donor) shows that the vast majority of samples are good-quality:

As we would expect, the doublet droplets have larger library sizes and more genes detected. The unassigned droplets generally have smaller library sizes and fewer genes detected.

# for batch 0 each capture is from a different donor
sceLst[[1]]$GeneticDonor <- sceLst[[1]]$Capture

p <- vector("list", length(sceLst))
for(i in 1:length(sceLst)){
  sce <- sceLst[[i]]
  
  
  p1 <- plotColData(
    sce,
    "sum",
    x = "GeneticDonor",
    other_fields = c("Capture"),
    colour_by = "GeneticDonor",
    point_size = 1) +
    scale_y_log10() +
    theme(axis.text.x = element_blank()) +
    geom_hline(yintercept = 500,
               linetype = "dotted") +
    annotation_logticks(
      sides = "l",
      short = unit(0.03, "cm"),
      mid = unit(0.06, "cm"),
      long = unit(0.09, "cm"))
  p2 <- plotColData(
    sce,
    "detected",
    x = "GeneticDonor",
    other_fields = c("Capture"),
    colour_by = "GeneticDonor",
    point_size = 1) +
    theme(axis.text.x = element_blank())
  p3 <- plotColData(
    sce,
    "subsets_Mito_percent",
    x = "GeneticDonor",
    other_fields = c("Capture"),
    colour_by = "GeneticDonor",
    point_size = 1) +
    theme(axis.text.x = element_blank())
  p4 <- plotColData(
    sce,
    "subsets_Ribo_percent",
    x = "GeneticDonor",
    other_fields = c("Capture"),
    colour_by = "GeneticDonor",
    point_size = 1) +
    theme(axis.text.x = element_blank())
  
  p[[i]] <- p1 + p2 + p3 + p4 + 
    plot_layout(guides = "collect", ncol = 2) +
    plot_annotation(title = glue("Batch {i-1}"))
}

p

[[1]]

Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


[[2]]

Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


[[3]]

Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


[[4]]

Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


[[5]]

Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


[[6]]

Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


[[7]]

Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27

Identify outliers by each metric

Filtering on the mitochondrial proportion can identify stressed/damaged cells and so we seek to identify droplets with unusually large mitochondrial proportions (i.e. outliers). Outlier thresholds are defined based on the median absolute deviation (MADs) from the median value of the metric across all cells. Here, we opt to use donor-specific thresholds to account for donor-specific differences³.

The following table summarises the QC cutoffs:

# for batch 0, remove droplets with library size < 500 for consistency with other batches
sceLst[[1]] <- sceLst[[1]][, sceLst[[1]]$sum >= 500]

# identify % mito outliers
sceLst <- sapply(sceLst, function(sce){
  sce$mito_drop <- isOutlier(
    metric = sce$subsets_Mito_percent, 
    nmads = 3, 
    type = "higher",
    batch = sce$GeneticDonor,
    subset = !grepl("Unknown", sce$GeneticDonor))
  
  data.frame(
    sample = factor(
      colnames(attributes(sce$mito_drop)$thresholds),
      levels(sce$GeneticDonor)),
    lower = attributes(sce$mito_drop)$thresholds["higher", ]) %>%
    arrange(sample) %>%
    knitr::kable(caption = "Sample-specific %mito cutoffs", digits = 1) %>%
    print()
  
  sce
})

Sample-specific %mito cutoffs
	sample	lower
A	A	19.3
B	B	20.1
C	C	15.0
D	D	14.8

Sample-specific %mito cutoffs
	sample	lower
A	A	6.9
B	B	14.5
C	C	11.4
D	D	12.5
E	E	12.9
F	F	14.5
G	G	11.1
H	H	13.1
Doublet	Doublet	11.7
Unknown	Unknown	11.1

Sample-specific %mito cutoffs
	sample	lower
A	A	12.8
B	B	15.3
C	C	15.9
D	D	15.7
Doublet	Doublet	14.4
Unknown	Unknown	14.4

Sample-specific %mito cutoffs
	sample	lower
A	A	13.1
B	B	9.2
C	C	9.6
D	D	9.7
E	E	9.1
F	F	9.2
G	G	12.6
H	H	9.9
Doublet	Doublet	9.3
Unknown	Unknown	9.7

Sample-specific %mito cutoffs
	sample	lower
A	A	11.4
B	B	10.8
C	C	8.0
D	D	9.1
E	E	10.5
F	F	9.0
G	G	11.8
Doublet	Doublet	9.6
Unknown	Unknown	9.6

Sample-specific %mito cutoffs
	sample	lower
A	A	13.1
B	B	14.7
C	C	16.2
D	D	11.4
E	E	12.2
F	F	15.1
G	G	11.7
H	H	11.9
Doublet	Doublet	13.0
Unknown	Unknown	12.9

Sample-specific %mito cutoffs
	sample	lower
A	A	9.2
B	B	11.3
C	C	11.2
D	D	10.1
Doublet	Doublet	10.5
Unknown	Unknown	10.3

The vast majority of cells are retained for all samples.

sceFlt <- sapply(sceLst, function(sce){
  scePre <- sce
  keep <- !sce$mito_drop
  scePre$keep <- keep
  sce <- sce[, keep]
  
  data.frame(
    ByMito = tapply(
      scePre$mito_drop, 
      scePre$GeneticDonor, 
      sum,
      na.rm = TRUE),
    Remaining = as.vector(unname(table(sce$GeneticDonor))),
    PercRemaining = round(
      100 * as.vector(unname(table(sce$GeneticDonor))) /
        as.vector(
          unname(
            table(scePre$GeneticDonor))), 1)) %>%
    tibble::rownames_to_column("GeneticDonor") %>%
    dplyr::arrange(dplyr::desc(PercRemaining)) %>%
    knitr::kable(
      caption = "Number of droplets removed by each QC step and the number of droplets remaining.") %>%
    print()
  
  sce
})

Number of droplets removed by each QC step and the number of droplets remaining.
GeneticDonor	ByMito	Remaining	PercRemaining
D	994	9820	90.8
C	946	9129	90.6
A	462	3620	88.7
B	588	4370	88.1

Number of droplets removed by each QC step and the number of droplets remaining.
GeneticDonor	ByMito	Remaining	PercRemaining
G	174	3093	94.7
H	164	2554	94.0
Doublet	137	1904	93.3
D	211	2846	93.1
C	172	2219	92.8
F	198	2082	91.3
A	348	3450	90.8
B	218	1615	88.1
E	300	2207	88.0
Unknown	276	655	70.4

Number of droplets removed by each QC step and the number of droplets remaining.
GeneticDonor	ByMito	Remaining	PercRemaining
Doublet	368	8723	96.0
A	1039	15039	93.5
B	165	1962	92.2
C	422	4289	91.0
D	1665	15698	90.4
Unknown	496	3294	86.9

Number of droplets removed by each QC step and the number of droplets remaining.
GeneticDonor	ByMito	Remaining	PercRemaining
Doublet	377	10627	96.6
C	515	11404	95.7
E	518	10324	95.2
H	319	5815	94.8
B	488	7233	93.7
D	290	3854	93.0
F	187	2365	92.7
G	398	3887	90.7
A	370	2529	87.2
Unknown	1053	2289	68.5

Number of droplets removed by each QC step and the number of droplets remaining.
GeneticDonor	ByMito	Remaining	PercRemaining
D	460	11833	96.3
Doublet	265	6289	96.0
G	188	3911	95.4
C	325	6631	95.3
E	662	11919	94.7
A	5	66	93.0
F	258	2844	91.7
B	250	2267	90.1
Unknown	619	1416	69.6

Number of droplets removed by each QC step and the number of droplets remaining.
GeneticDonor	ByMito	Remaining	PercRemaining
Doublet	294	6169	95.5
G	549	9728	94.7
C	441	7479	94.4
H	510	8392	94.3
D	303	2729	90.0
B	159	1124	87.6
E	700	4937	87.6
F	386	2679	87.4
A	222	1478	86.9
Unknown	613	1776	74.3

Number of droplets removed by each QC step and the number of droplets remaining.
GeneticDonor	ByMito	Remaining	PercRemaining
Doublet	228	6388	96.6
D	924	14327	93.9
B	576	8652	93.8
A	736	9377	92.7
C	817	7741	90.5
Unknown	452	901	66.6

Of concern is whether the cells removed during QC preferentially derive from particular experimental groups. Reassuringly, Figure @ref(fig:barplot-highlighting-outliers) shows that this is not the case.

p <- lapply(1:length(sceLst), function(i){
  sce <- sceLst[[i]]
  flt <- sceFlt[[i]]
  
  sce$keep <- colnames(sce) %in% colnames(flt)
  ggcells(sce) +
    geom_bar(aes(x = GeneticDonor, fill = keep)) + 
    ylab("Number of droplets") + 
    theme_cowplot(font_size = 7) + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    facet_grid(GeneticDonor ~ ., scales = "free_y")
})

p

[[1]]

Droplets removed during QC, stratified by Sample.

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


[[2]]

Droplets removed during QC, stratified by Sample.

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


[[3]]

Droplets removed during QC, stratified by Sample.

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


[[4]]

Droplets removed during QC, stratified by Sample.

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


[[5]]

Droplets removed during QC, stratified by Sample.

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


[[6]]

Droplets removed during QC, stratified by Sample.

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


[[7]]

Droplets removed during QC, stratified by Sample.

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27

Finally, Figure @ref(fig:qcplot-highlighting-outliers) compares the QC metrics of the discarded and retained droplets.

p <- lapply(1:length(sceLst), function(i){
  sce <- sceLst[[i]]
  flt <- sceFlt[[i]]
  
  sce$keep <- colnames(sce) %in% colnames(flt)
  
  p1 <- plotColData(
    sce,
    "sum",
    x = "GeneticDonor",
    colour_by = "keep",
    point_size = 0.5) +
    scale_y_log10() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    annotation_logticks(
      sides = "l",
      short = unit(0.03, "cm"),
      mid = unit(0.06, "cm"),
      long = unit(0.09, "cm"))
  p2 <- plotColData(
    sce,
    "detected",
    x = "GeneticDonor",
    colour_by = "keep",
    point_size = 0.5) +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
  p3 <- plotColData(
    sce,
    "subsets_Mito_percent",
    x = "GeneticDonor",
    colour_by = "keep",
    point_size = 0.5) +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
  p4 <- plotColData(
    sce,
    "subsets_Ribo_percent",
    x = "GeneticDonor",
    colour_by = "keep",
    point_size = 0.5) +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
  p1 + p2 + p3 + p4 + plot_layout(guides = "collect")
})

p

[[1]]

Distribution of QC metrics for each plate in the dataset. Each point represents a cell and is colored according to whether it was discarded during the QC process. Note that a cell will only be kept if it passes the relevant threshold for all QC metrics.

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


[[2]]

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


[[3]]

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


[[4]]

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


[[5]]

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


[[6]]

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


[[7]]

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27

Filter out unassigned droplets

Remove droplets that could not be assigned using genetics.

sceFlt <- sapply(sceFlt, function(sce){
  
  sce <- sce[, sce$GeneticDonor != "Unknown"]
  sce
})

QC summary

We had already removed droplets that have unusually small library sizes or number of genes detected by the process of identifying empty droplets. We have now further removed droplets whose mitochondrial proportions we deem to be an outlier.

To conclude, Figure @ref(fig:qcplot-post-outlier-removal) shows that following QC that most samples have similar QC metrics, as is to be expected, and Figure@ref(fig:experiment-by-donor-postqc) summarises the experimental design following QC.

p <- lapply(sceFlt, function(sce){
  p1 <- plotColData(
    sce,
    "sum",
    x = "GeneticDonor",
    other_fields = c("Capture", "GeneticDonor"),
    colour_by = "GeneticDonor",
    point_size = 0.5) +
    scale_y_log10() +
    theme(axis.text.x = element_blank()) +
    annotation_logticks(
      sides = "l",
      short = unit(0.03, "cm"),
      mid = unit(0.06, "cm"),
      long = unit(0.09, "cm"))
  p2 <- plotColData(
    sce,
    "detected",
    x = "GeneticDonor",
    other_fields = c("Capture", "GeneticDonor"),
    colour_by = "GeneticDonor",
    point_size = 0.5) +
    theme(axis.text.x = element_blank())
  p3 <- plotColData(
    sce,
    "subsets_Mito_percent",
    x = "GeneticDonor",
    other_fields = c("Capture", "GeneticDonor"),
    colour_by = "GeneticDonor",
    point_size = 0.5) +
    theme(axis.text.x = element_blank())
  p4 <- plotColData(
    sce,
    "subsets_Ribo_percent",
    x = "GeneticDonor",
    other_fields = c("Capture", "GeneticDonor"),
    colour_by = "GeneticDonor",
    point_size = 0.5) +
    theme(axis.text.x = element_blank())
  p1 + p2 + p3 + p4 + plot_layout(guides = "collect", ncol = 2)
})

p

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch0/data/SCEs/C133_Neeland_batch0.preprocessed.SCE.rds`

Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch1/data/SCEs/C133_Neeland_batch1.preprocessed.SCE.rds`

Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch2/data/SCEs/C133_Neeland_batch2.preprocessed.SCE.rds`

Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch3/data/SCEs/C133_Neeland_batch3.preprocessed.SCE.rds`

Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch4/data/SCEs/C133_Neeland_batch4.preprocessed.SCE.rds`

Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch5/data/SCEs/C133_Neeland_batch5.preprocessed.SCE.rds`

Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27


$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch6/data/SCEs/C133_Neeland_batch6.preprocessed.SCE.rds`

Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

Version	Author	Date
ab023d9	Jovana Maksimovic	2024-02-27

Update batch0 object to include dmmHTO column to align with other batches.

batch <- grepl("batch0", names(sceFlt))
sceFlt[batch][[1]]$dmmHTO <- sceFlt[batch][[1]]$Capture

p <- lapply(sceFlt, function(sce){
  p1 <- ggcells(sce) + 
    geom_bar(
      aes(x = GeneticDonor, fill = dmmHTO),
      position = position_fill(reverse = TRUE)) +
    coord_flip() +
    ylab("Frequency") +
    theme_cowplot(font_size = 10) 
  p2 <- ggcells(sce) + 
    geom_bar(
      aes(x = GeneticDonor, fill = Capture),
      position = position_fill(reverse = TRUE)) +
    coord_flip() +
    ylab("Frequency") +
    theme_cowplot(font_size = 10)
  p3 <- ggcells(sce) + 
    geom_bar(aes(x = GeneticDonor, fill = GeneticDonor)) + 
    coord_flip() + 
    ylab("Number of droplets") + 
    theme_cowplot(font_size = 10) + 
    geom_text(stat='count', aes(x = GeneticDonor, label=..count..), hjust=1.5, size=2) +
    guides(fill = FALSE)
  p1 / p2 / p3 + plot_layout(guides = "collect")
})

p

$`/Users/maksimovicjovana/Work/Projects/MCRI/melanie.neeland/paed-inflammation-CITEseq/data/C133_Neeland_batch0/data/SCEs/C133_Neeland_batch0.preprocessed.SCE.rds`