Last updated: 2022-06-16

Checks: 7 0

Knit directory: paed-cf-cite-seq/

This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20210524) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 8255c24. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/obsolete/
    Ignored:    code/obsolete/
    Ignored:    data/190930_A00152_0150_BHTYCMDSXX/
    Ignored:    data/CellRanger/
    Ignored:    data/GSE127465_RAW/
    Ignored:    data/SCEs/02_ZILIONIS.sct_normalised.SEU.rds
    Ignored:    data/SCEs/03_C133_Neeland.demultiplexed.SCE.rds
    Ignored:    data/SCEs/03_C133_Neeland.emptyDrops.SCE.rds
    Ignored:    data/SCEs/03_C133_Neeland.nuclear_fraction_calls.rds
    Ignored:    data/SCEs/03_C133_Neeland.preprocessed.SCE.rds
    Ignored:    data/SCEs/03_CF_BAL_Pilot.CellRanger_v6.SCE.rds
    Ignored:    data/SCEs/03_CF_BAL_Pilot.emptyDrops.SCE.rds
    Ignored:    data/SCEs/03_CF_BAL_Pilot.nuclear_fraction_calls.rds
    Ignored:    data/SCEs/03_CF_BAL_Pilot.preprocessed.SCE.rds
    Ignored:    data/SCEs/03_COMBO.clustered.SEU.rds
    Ignored:    data/SCEs/03_COMBO.clustered_annotated_macrophages_diet.SEU.rds
    Ignored:    data/SCEs/03_COMBO.clustered_annotated_others_diet.SEU.rds
    Ignored:    data/SCEs/03_COMBO.clustered_annotated_tcells_diet.SEU.rds
    Ignored:    data/SCEs/03_COMBO.clustered_azimuth.SEU.rds
    Ignored:    data/SCEs/03_COMBO.clustered_azimuth_v2.SEU.rds
    Ignored:    data/SCEs/03_COMBO.clustered_diet.SEU.rds
    Ignored:    data/SCEs/03_COMBO.integrated.SEU.rds
    Ignored:    data/SCEs/03_COMBO.zilionis_mapped.SEU.rds
    Ignored:    data/SCEs/04_C133_Neeland.adt_dsb_normalised.rds
    Ignored:    data/SCEs/04_C133_Neeland.adt_integrated.rds
    Ignored:    data/SCEs/04_C133_Neeland.all_integrated.SEU.rds
    Ignored:    data/SCEs/04_CF_BAL_Pilot.CellRanger_v6.SCE.rds
    Ignored:    data/SCEs/04_CF_BAL_Pilot.emptyDrops.SCE.rds
    Ignored:    data/SCEs/04_CF_BAL_Pilot.preprocessed.SCE.rds
    Ignored:    data/SCEs/04_CF_BAL_Pilot.transfer_adt.SEU.rds
    Ignored:    data/SCEs/04_COMBO.clean_clustered.SEU.rds
    Ignored:    data/SCEs/04_COMBO.clean_clustered.SEU_bk.rds
    Ignored:    data/SCEs/04_COMBO.clean_integrated.SEU.rds
    Ignored:    data/SCEs/04_COMBO.clean_integrated.SEU_bk.rds
    Ignored:    data/SCEs/04_COMBO.clean_macrophages_diet.SEU.rds
    Ignored:    data/SCEs/04_COMBO.clean_others_diet.SEU.rds
    Ignored:    data/SCEs/04_COMBO.clean_tcells_diet.SEU.rds
    Ignored:    data/SCEs/04_COMBO.clustered.SEU.rds
    Ignored:    data/SCEs/04_COMBO.clustered_annotated_adt_diet.SEU.rds
    Ignored:    data/SCEs/04_COMBO.clustered_annotated_lung_diet.SEU.rds
    Ignored:    data/SCEs/04_COMBO.clustered_annotated_macrophages_diet.SEU.rds
    Ignored:    data/SCEs/04_COMBO.clustered_annotated_others_diet.SEU.rds
    Ignored:    data/SCEs/04_COMBO.clustered_annotated_tcells_diet.SEU.rds
    Ignored:    data/SCEs/04_COMBO.clustered_diet.SEU.rds
    Ignored:    data/SCEs/04_COMBO.integrated.SEU.rds
    Ignored:    data/SCEs/04_COMBO.macrophages_clustered.SEU.rds
    Ignored:    data/SCEs/04_COMBO.macrophages_integrated.SEU.rds
    Ignored:    data/SCEs/04_COMBO.others_clustered.SEU.rds
    Ignored:    data/SCEs/04_COMBO.others_integrated.SEU.rds
    Ignored:    data/SCEs/04_COMBO.tcells_clustered.SEU.rds
    Ignored:    data/SCEs/04_COMBO.tcells_integrated.SEU.rds
    Ignored:    data/SCEs/04_COMBO.zilionis_mapped.SEU.rds
    Ignored:    data/SCEs/05_CF_BAL_Pilot.transfer_adt.SEU.rds
    Ignored:    data/SCEs/05_COMBO.clean_clustered.SEU.rds
    Ignored:    data/SCEs/05_COMBO.clean_integrated.SEU.rds
    Ignored:    data/SCEs/05_COMBO.clean_macrophages_diet.SEU.rds
    Ignored:    data/SCEs/05_COMBO.clean_others_diet.SEU.rds
    Ignored:    data/SCEs/05_COMBO.clean_tcells_diet.SEU.rds
    Ignored:    data/SCEs/05_COMBO.clustered_annotated_adt_diet.SEU.rds
    Ignored:    data/SCEs/05_COMBO.clustered_annotated_lung_diet.SEU.rds
    Ignored:    data/SCEs/05_COMBO.clustered_annotated_macrophages_diet.SEU.rds
    Ignored:    data/SCEs/05_COMBO.clustered_annotated_others_diet.SEU.rds
    Ignored:    data/SCEs/05_COMBO.clustered_annotated_tcells_diet.SEU.rds
    Ignored:    data/SCEs/05_COMBO.macrophages_clustered.SEU.rds
    Ignored:    data/SCEs/05_COMBO.macrophages_integrated.SEU.rds
    Ignored:    data/SCEs/05_COMBO.others_clustered.SEU.rds
    Ignored:    data/SCEs/05_COMBO.others_integrated.SEU.rds
    Ignored:    data/SCEs/05_COMBO.tcells_clustered.SEU.rds
    Ignored:    data/SCEs/05_COMBO.tcells_integrated.SEU.rds
    Ignored:    data/SCEs/06_COMBO.clean_clustered.SEU.rds
    Ignored:    data/SCEs/06_COMBO.clean_integrated.SEU.rds
    Ignored:    data/SCEs/06_COMBO.clean_macrophages_diet.SEU.rds
    Ignored:    data/SCEs/06_COMBO.clean_others_diet.SEU.rds
    Ignored:    data/SCEs/06_COMBO.clean_tcells_diet.SEU.rds
    Ignored:    data/SCEs/06_COMBO.macrophages_clustered.SEU.rds
    Ignored:    data/SCEs/06_COMBO.macrophages_integrated.SEU.rds
    Ignored:    data/SCEs/06_COMBO.others_clustered.SEU.rds
    Ignored:    data/SCEs/06_COMBO.others_integrated.SEU.rds
    Ignored:    data/SCEs/06_COMBO.tcells_clustered.SEU.rds
    Ignored:    data/SCEs/06_COMBO.tcells_integrated.SEU.rds
    Ignored:    data/SCEs/C133_Neeland.CellRanger.SCE.rds
    Ignored:    data/SCEs/obsolete/
    Ignored:    data/emptyDrops/
    Ignored:    data/obsolete/
    Ignored:    data/sample_sheets/obsolete/
    Ignored:    output/marker-analysis/obsolete/
    Ignored:    output/obsolete/
    Ignored:    rename_captures.R
    Ignored:    renv/library/
    Ignored:    renv/staging/
    Ignored:    wflow_background.R

Unstaged changes:
    Modified:   .gitignore
    Modified:   .renvignore
    Deleted:    analysis/03_C133_Neeland.demultiplex.Rmd
    Deleted:    analysis/03_C133_Neeland.preprocess.Rmd
    Deleted:    analysis/03_COMBO.clustering_annotation.Rmd
    Deleted:    analysis/04_CF_BAL_Pilot.emptyDrops.Rmd
    Deleted:    analysis/04_CF_BAL_Pilot.preprocess.Rmd
    Deleted:    analysis/04_COMBO.transfer_proteins.Rmd
    Deleted:    analysis/05_COMBO.cluster_macrophages.Rmd
    Deleted:    analysis/05_COMBO.cluster_others.Rmd
    Deleted:    analysis/05_COMBO.cluster_tcells.Rmd
    Deleted:    analysis/05_COMBO.expression_analysis.Rmd
    Deleted:    analysis/05_COMBO.postprocess_all.Rmd
    Deleted:    analysis/05_COMBO.postprocess_macrophages.Rmd
    Deleted:    analysis/05_COMBO.postprocess_others.Rmd
    Deleted:    analysis/05_COMBO.postprocess_tcells.Rmd
    Modified:   renv/.gitignore
    Modified:   renv/settings.dcf

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/05_C133_Neeland.preprocess.Rmd) and HTML (docs/05_C133_Neeland.preprocess.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 8255c24 Jovana Maksimovic 2022-06-16 wflow_publish(c(paste0("analysis/", list.files(path = here::here("analysis"),

1 Load libraries

suppressPackageStartupMessages(library(BiocStyle))
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(here))
suppressPackageStartupMessages(library(glue))
suppressPackageStartupMessages(library(DropletUtils))
suppressPackageStartupMessages(library(scran))
suppressPackageStartupMessages(library(scater))
suppressPackageStartupMessages(library(scuttle))
suppressPackageStartupMessages(library(janitor))
suppressPackageStartupMessages(library(cowplot))
suppressPackageStartupMessages(library(patchwork))
suppressPackageStartupMessages(library(scales))
suppressPackageStartupMessages(library(DropletQC))
suppressPackageStartupMessages(library(EnsDb.Hsapiens.v86))
suppressPackageStartupMessages(library(ensembldb))

2 Load data

sce <- readRDS(
  here("data", "SCEs", "03_C133_Neeland.demultiplexed.SCE.rds"))

2.1 Incorporate cell-based annotation

Cell-based annotations are included in the colData of the SingleCellExperiment. We store the counts from the hashtag features as an ‘alternative experiment’.

# Some useful colours
hto_colours <- setNames(
  unique(sce$colours$hto_colours),
  unique(names(sce$colours$hto_colours)))
genetic_donor_colours <- setNames(
  unique(sce$colours$genetic_donor_colours),
  unique(names(sce$colours$genetic_donor_colours)))
capture_colours <- setNames(
  unique(sce$colours$capture_colours),
  unique(names(sce$colours$capture_colours)))

2.1.1 Summary

p1 <- ggcells(sce) + 
  geom_bar(
    aes(x = genetic_donor, fill = HTO),
    position = position_fill(reverse = TRUE)) +
  coord_flip() +
  ylab("Frequency") +
  theme_cowplot(font_size = 10) + 
  scale_fill_manual(values = hto_colours)

p2 <- ggcells(sce) + 
  geom_bar(
    aes(x = genetic_donor, fill = capture),
    position = position_fill(reverse = TRUE)) +
  coord_flip() +
  ylab("Frequency") +
  theme_cowplot(font_size = 10) + 
  scale_fill_manual(values = capture_colours)

p3 <- ggcells(sce) + 
  geom_bar(aes(x = genetic_donor, fill = genetic_donor)) + 
  coord_flip() + 
  ylab("Number of droplets") + 
  theme_cowplot(font_size = 10) + 
  scale_fill_manual(values = genetic_donor_colours) +
  geom_text(stat='count', aes(x = genetic_donor, label=..count..), hjust=1.5, size=2) +
  guides(fill = FALSE)

p1 / p2 / p3 
Breakdown of the samples by `genetic_donor` and `capture`.

Figure 2.1: Breakdown of the samples by genetic_donor and capture.

2.2 Incorporating gene-based annotation

Having quantified gene expression against the Ensembl gene annotation, we have Ensembl-style identifiers for the genes. These identifiers are used as they are unambiguous and highly stable. However, they are difficult to interpret compared to the gene symbols which are more commonly used in the literature. Given the Ensembl identifiers, we obtain the corresponding gene symbols using annotation packages available through Bioconductor. Henceforth, we will use gene symbols (where available) to refer to genes in our analysis and otherwise use the Ensembl-style gene identifiers1.

rownames(sce) <- uniquifyFeatureNames(rowData(sce)$ID, rowData(sce)$Symbol)
# Add chromosome location so we can filter on mitochondrial genes.
location <- mapIds(
  x = EnsDb.Hsapiens.v86,
  # NOTE: Need to remove gene version number prior to lookup.
  keys = rowData(sce)$ID,
  keytype = "GENEID",
  column = "SEQNAME")
rowData(sce)$CHR <- location
# Additional gene metadata from ENSEMBL and NCBI
# NOTE: These columns were customised for this project.
ensdb_columns <- c(
  "GENEBIOTYPE", "GENENAME", "GENESEQSTART", "GENESEQEND", "SEQNAME", "SYMBOL")
names(ensdb_columns) <- paste0("ENSEMBL.", ensdb_columns)
stopifnot(all(ensdb_columns %in% columns(EnsDb.Hsapiens.v86)))
ensdb_df <- DataFrame(
  lapply(ensdb_columns, function(column) {
    mapIds(
      x = EnsDb.Hsapiens.v86,
      keys = rowData(sce)$ID,
      keytype = "GENEID",
      column = column,
      multiVals = "CharacterList")
  }),
  row.names = rowData(sce)$ID)
# NOTE: Can't look up GENEID column with GENEID key, so have to add manually.
ensdb_df$ENSEMBL.GENEID <- rowData(sce)$ID
# NOTE: Homo.sapiens combines org.Hs.eg.db and
#       TxDb.Hsapiens.UCSC.hg19.knownGene (as well as others) and therefore
#       uses entrez gene and RefSeq based data.
library(Homo.sapiens)
# NOTE: These columns were customised for this project.
ncbi_columns <- c(
  # From TxDB: None required
  # From OrgDB
  "ALIAS", "ENTREZID", "GENENAME", "REFSEQ", "SYMBOL")
names(ncbi_columns) <- paste0("NCBI.", ncbi_columns)
stopifnot(all(ncbi_columns %in% columns(Homo.sapiens)))
ncbi_df <- DataFrame(
  lapply(ncbi_columns, function(column) {
    mapIds(
      x = Homo.sapiens,
      keys = rowData(sce)$ID,
      keytype = "ENSEMBL",
      column = column,
      multiVals = "CharacterList")
  }),
  row.names = rowData(sce)$ID)
rowData(sce) <- cbind(rowData(sce), ensdb_df, ncbi_df)
# Some useful gene sets
mito_set <- rownames(sce)[which(rowData(sce)$CHR == "MT")]
ribo_set <- grep("^RP(S|L)", rownames(sce), value = TRUE)
# NOTE: A more curated approach for identifying ribosomal protein genes
#       (https://github.com/Bioconductor/OrchestratingSingleCellAnalysis-base/blob/ae201bf26e3e4fa82d9165d8abf4f4dc4b8e5a68/feature-selection.Rmd#L376-L380)
library(msigdbr)
c2_sets <- msigdbr(species = "Homo sapiens", category = "C2")
ribo_set <- union(
  ribo_set,
  c2_sets[c2_sets$gs_name == "KEGG_RIBOSOME", ]$human_gene_symbol)
is_ribo <- rownames(sce) %in% ribo_set
sex_set <- rownames(sce)[any(rowData(sce)$ENSEMBL.SEQNAME %in% c("X", "Y"))]
pseudogene_set <- rownames(sce)[
  any(grepl("pseudogene", rowData(sce)$ENSEMBL.GENEBIOTYPE))]

head(rowData(sce)) %>%
  knitr::kable()
ID Symbol Type CHR ENSEMBL.GENEBIOTYPE ENSEMBL.GENENAME ENSEMBL.GENESEQSTART ENSEMBL.GENESEQEND ENSEMBL.SEQNAME ENSEMBL.SYMBOL ENSEMBL.GENEID NCBI.ALIAS NCBI.ENTREZID NCBI.GENENAME NCBI.REFSEQ NCBI.SYMBOL
MIR1302-2HG ENSG00000243485 MIR1302-2HG Gene Expression 1 lincRNA MIR1302-2 29554 31109 1 MIR1302-2 ENSG00000243485 NA NA NA NA NA
FAM138A ENSG00000237613 FAM138A Gene Expression 1 lincRNA FAM138A 34554 36081 1 FAM138A ENSG00000237613 F379, FA…. 645520 family w…. NR_026818 FAM138A
OR4F5 ENSG00000186092 OR4F5 Gene Expression 1 protein_…. OR4F5 69091 70008 1 OR4F5 ENSG00000186092 OR4F5 79501 olfactor…. NM_00100…. OR4F5
AL627309.1 ENSG00000238009 AL627309.1 Gene Expression 1 lincRNA RP11-34P13.7 89295 133723 1 RP11-34P13.7 ENSG00000238009 LOC100996442 100996442 uncharac…. XR_00173…. LOC100996442
AL627309.3 ENSG00000239945 AL627309.3 Gene Expression 1 lincRNA RP11-34P13.8 89551 91105 1 RP11-34P13.8 ENSG00000239945 NA NA NA NA NA
AL627309.2 ENSG00000239906 AL627309.2 Gene Expression 1 antisense RP11-34P…. 139790 140339 1 RP11-34P…. ENSG00000239906 NA NA NA NA NA

3 Quality control

3.1 Define the quality control metrics

Low-quality cells need to be removed to ensure that technical effects do not distort downstream analysis results. We use several quality control (QC) metrics to measure the quality of the cells:

  • sum: This measures the library size of the cells, which is the total sum of counts across both genes and spike-in transcripts. We want cells to have high library sizes as this means more RNA has been successfully captured during library preparation.
  • detected: This is the number of expressed features2 in each cell. Cells with few expressed features are likely to be of poor quality, as the diverse transcript population has not been successful captured.
  • subsets_Mito_percent: This measures the proportion of UMIs which are mapped to mitochondrial RNA. If there is a higher than expected proportion of mitochondrial RNA this is often symptomatic of a cell which is under stress and is therefore of low quality and will not be used for the analysis.
  • subsets_Ribo_percent: This measures the proportion of UMIs which are mapped to ribosomal protein genes. If there is a higher than expected proportion of ribosomal protein gene expression this is often symptomatic of a cell which is of compromised quality and we may want to exclude it from the analysis.

In summary, we aim to identify cells with low library sizes, few expressed genes, and very high percentages of mitochondrial and ribosomal protein gene expression.

is_mito <- rownames(sce) %in% mito_set
summary(is_mito)
is_ribo <- rownames(sce) %in% ribo_set
summary(is_ribo)
sce <- addPerCellQC(
  sce, 
  subsets = list(Mito = which(is_mito), Ribo = which(is_ribo)))

3.2 Visualise the QC metrics

Figure 3.1 shows that the vast majority of samples are good-quality:

  • The median library size is around 15,8003.
  • The median number of genes detected is around 3,600.
  • The median percentage of UMIs that are mapped to mitochondrial RNA is around 3%

However, the percentage of UMIs that are mapped to ribosomal protein genes is perhaps higher than we would like at 13%.

As we would expect, the doublet droplets have larger library sizes and more genes detected. The unassigned droplets generally have smaller library sizes and fewer genes detected.

p1 <- plotColData(
  sce,
  "sum",
  x = "genetic_donor",
  other_fields = c("capture", "HTO"),
  colour_by = "genetic_donor",
  point_size = 0.5) +
  scale_y_log10() +
  scale_colour_manual(values = genetic_donor_colours, name = "genetic_donor") +
  theme(axis.text.x = element_blank()) +
  annotation_logticks(
    sides = "l",
    short = unit(0.03, "cm"),
    mid = unit(0.06, "cm"),
    long = unit(0.09, "cm"))
p2 <- plotColData(
  sce,
  "detected",
  x = "genetic_donor",
  other_fields = c("capture", "HTO"),
  colour_by = "genetic_donor",
  point_size = 0.5) +
  scale_colour_manual(values = genetic_donor_colours, name = "genetic_donor") +
  theme(axis.text.x = element_blank())
p3 <- plotColData(
  sce,
  "subsets_Mito_percent",
  x = "genetic_donor",
  other_fields = c("capture", "HTO"),
  colour_by = "genetic_donor",
  point_size = 0.5) +
  scale_colour_manual(values = genetic_donor_colours, name = "genetic_donor") +
  theme(axis.text.x = element_blank())
p4 <- plotColData(
  sce,
  "subsets_Ribo_percent",
  x = "genetic_donor",
  other_fields = c("capture", "HTO"),
  colour_by = "genetic_donor",
  point_size = 0.5) +
  scale_colour_manual(values = genetic_donor_colours, name = "genetic_donor") +
  theme(axis.text.x = element_blank())
p1 + p2 + p3 + p4 + plot_layout(guides = "collect", ncol = 2)
Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Figure 3.1: Distributions of various QC metrics for all cells in the dataset. This includes the library sizes, number of genes detected, and percentage of reads mapped to mitochondrial genes.

Figures 3.2 - 3.5 stratify each of the panels in 3.1 by capture and genetic_donor.

p1 + facet_grid(capture ~ HTO) + 
  theme(legend.position="bottom", text = element_text(size=6), axis.text.y = element_text(size=6)) 
Library size per droplet stratified by `capture` and `HTO`

Figure 3.2: Library size per droplet stratified by capture and HTO

p2 + facet_grid(capture ~ HTO) + 
  theme(legend.position="bottom", text = element_text(size=6), axis.text.y = element_text(size=6)) 
Number of genes detected per droplet stratified by `capture` and `HTO`

Figure 3.3: Number of genes detected per droplet stratified by capture and HTO

p3 + facet_grid(capture ~ HTO) + 
  theme(legend.position="bottom", text = element_text(size=6), axis.text.y = element_text(size=6)) 
Percentage of UMIs that are mapped to mitochondrial RNA per droplet stratified by `capture` and `HTO`

Figure 3.4: Percentage of UMIs that are mapped to mitochondrial RNA per droplet stratified by capture and HTO

p4 + facet_grid(capture ~ HTO) + 
  theme(legend.position="bottom", text = element_text(size=6), axis.text.y = element_text(size=6)) 
Percentage of UMIs that are mapped to ribosomal protein genes per droplet stratified by `capture` and `HTO`

Figure 3.5: Percentage of UMIs that are mapped to ribosomal protein genes per droplet stratified by capture and HTO

3.3 Identify outliers by each metric

Filtering on the mitochondrial proportion can identify stressed/damaged cells and so we seek to identify droplets with unusually large mitochondrial proportions (i.e. outliers). Outlier thresholds are defined based on the median absolute deviation (MADs) from the median value of the metric across all cells. Here, we opt to use donor-specific thresholds to account for donor-specific differences4.

sce$batch <- sce$genetic_donor

mito_drop <- isOutlier(
  metric = sce$subsets_Mito_percent, 
  nmads = 3, 
  type = "higher",
  batch = sce$batch,
  subset = !grepl("unassigned", sce$genetic_donor))
mito_drop_df <- data.frame(
  sample = factor(
    colnames(attributes(mito_drop)$thresholds),
    levels(sce$batch)),
  lower = attributes(mito_drop)$thresholds["higher", ])
ribo_drop <- isOutlier(
  metric = sce$subsets_Ribo_percent, 
  nmads = 3, 
  type = "higher",
  batch = sce$batch,
  subset = !grepl("unassigned", sce$genetic_donor))
ribo_drop_df <- data.frame(
  sample = factor(
    colnames(attributes(ribo_drop)$thresholds),
    levels(sce$batch)),
  lower = attributes(ribo_drop)$thresholds["higher", ])

The following table summarises the QC cutoffs:

qc_cutoffs_df <- dplyr::inner_join(mito_drop_df, ribo_drop_df, by = "sample")
colnames(qc_cutoffs_df) <- c("batch", "%mito", "%ribo")
inner_join(
  qc_cutoffs_df,
  distinct(as.data.frame(colData(sce)[, c("batch"), drop = FALSE])),
  by = "batch") %>%
  dplyr::select(batch, everything()) %>%
  arrange(batch) %>%
  knitr::kable(caption = "Sample-specific QC metric cutoffs", digits = 1)
Table 3.1: Sample-specific QC metric cutoffs
batch %mito %ribo
donor_A 13.3 48.1
donor_B 14.5 32.5
donor_C 11.4 26.0
donor_D 14.6 22.9
donor_E 6.9 22.6
donor_F 11.1 21.8
donor_G 12.6 38.5
donor_H 13.1 19.8
doublet 12.1 20.6
unassigned 11.3 25.4

The vast majority of cells are retained for all samples.

sce_pre_QC_outlier_removal <- sce
# TODO: Decide if excluding based on ribo
keep <- !mito_drop
sce_pre_QC_outlier_removal$keep <- keep
sce <- sce[, keep]
data.frame(
  ByMito = tapply(
    mito_drop, 
    sce_pre_QC_outlier_removal$batch, 
    sum,
    na.rm = TRUE),
  Remaining = as.vector(unname(table(sce$batch))),
  PercRemaining = round(
    100 * as.vector(unname(table(sce$batch))) /
      as.vector(
        unname(
          table(sce_pre_QC_outlier_removal$batch))), 1)) %>%
  tibble::rownames_to_column("batch") %>%
  dplyr::arrange(dplyr::desc(PercRemaining)) %>%
  knitr::kable(
    caption = "Number of samples removed by each QC step and the number of samples remaining.")
Table 3.2: Number of samples removed by each QC step and the number of samples remaining.
batch ByMito Remaining PercRemaining
donor_F 179 3113 94.6
donor_H 168 2586 93.9
doublet 149 1987 93.0
donor_C 178 2225 92.6
donor_G 235 2877 92.4
donor_D 203 2110 91.2
donor_E 356 3475 90.7
donor_B 228 1646 87.8
donor_A 330 2260 87.3
unassigned 131 392 75.0

3.3.1 Checking for removal of biologically relevant subpopulations

The biggest practical concern during QC is whether an entire cell type is inadvertently discarded. There is always some risk of this occurring as the QC metrics are never fully independent of biological state. We can diagnose cell type loss by looking for systematic differences in gene expression between the discarded and retained cells.

lost <- calculateAverage(counts(sce_pre_QC_outlier_removal)[, !keep])
kept <- calculateAverage(counts(sce_pre_QC_outlier_removal)[, keep])
library(edgeR)
logged <- cpm(cbind(lost, kept), log = TRUE, prior.count = 2)
logFC <- logged[, 1] - logged[, 2]
abundance <- rowMeans(logged)

If the discarded pool is enriched for a certain cell type, we should observe increased expression of the corresponding marker genes. Figure 3.6 shows the result of this analysis, highlighting that the systematically upregulated genes are mitochondrial transcripts and that those systematically downregulated genes are largely ribosomal protein genes. This suggests that the QC step did not inadvertently filter out an entire biologically relevant subpopulation.

is_mito <- rownames(sce) %in% mito_set
is_ribo <- rownames(sce) %in% ribo_set
par(mfrow = c(1, 1))
plot(
  abundance,
  logFC,
  xlab = "Average count",
  ylab = "Log-FC (lost/kept)",
  pch = 16)
points(
  abundance[is_mito],
  logFC[is_mito],
  col = "dodgerblue",
  pch = 16,
  cex = 1)
points(
  abundance[is_ribo],
  logFC[is_ribo],
  col = "orange",
  pch = 16,
  cex = 1)
abline(h = c(-1, 1), col = "red", lty = 2)
Log-fold change in expression in the discarded cells compared to the retained cells. Each point represents a gene with mitochondrial transcripts in blue and ribosomal protein genes in orange. Dashed red lines indicate $|logFC| = 1

Figure 3.6: Log-fold change in expression in the discarded cells compared to the retained cells. Each point represents a gene with mitochondrial transcripts in blue and ribosomal protein genes in orange. Dashed red lines indicate $|logFC| = 1

Another concern is whether the cells removed during QC preferentially derive from particular experimental groups. Reassuringly, Figure 3.7 shows that this is not the case.

ggcells(sce_pre_QC_outlier_removal) +
  geom_bar(aes(x = genetic_donor, fill = keep)) + 
  ylab("Number of droplets") + 
  theme_cowplot(font_size = 7) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  facet_grid(genetic_donor ~ ., scales = "free_y")
Droplets removed during QC, stratified by `Sample`.

Figure 3.7: Droplets removed during QC, stratified by Sample.

Finally, Figure 3.8 compares the QC metrics of the discarded and retained droplets.

p1 <- plotColData(
  sce_pre_QC_outlier_removal,
  "sum",
  x = "genetic_donor",
  colour_by = "keep",
  point_size = 0.5) +
  scale_y_log10() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  annotation_logticks(
    sides = "l",
    short = unit(0.03, "cm"),
    mid = unit(0.06, "cm"),
    long = unit(0.09, "cm"))
p2 <- plotColData(
  sce_pre_QC_outlier_removal,
  "detected",
  x = "genetic_donor",
  colour_by = "keep",
  point_size = 0.5) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
p3 <- plotColData(
  sce_pre_QC_outlier_removal,
  "subsets_Mito_percent",
  x = "genetic_donor",
  colour_by = "keep",
  point_size = 0.5) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
p4 <- plotColData(
  sce_pre_QC_outlier_removal,
  "subsets_Ribo_percent",
  x = "genetic_donor",
  colour_by = "keep",
  point_size = 0.5) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
p1 + p2 + p3 + p4 + plot_layout(guides = "collect")
Distribution of QC metrics for each plate in the dataset. Each point represents a cell and is colored according to whether it was discarded during the QC process. Note that a cell will only be kept if it passes the relevant threshold for all QC metrics.

Figure 3.8: Distribution of QC metrics for each plate in the dataset. Each point represents a cell and is colored according to whether it was discarded during the QC process. Note that a cell will only be kept if it passes the relevant threshold for all QC metrics.

3.4 QC summary

We had already removed droplets that have unusually small library sizes or number of genes detected by the process of identifying empty droplets. We have now further removed droplets whose mitochondrial proportions we deem to be an outlier.

For the time being, we opt to retain the Unknown and unassigned droplets because these may allow us to identify a subpopulation of cells that are not captured by hashing or genetic demultiplexing.

To conclude, Figure 3.9 shows that following QC that most samples have similar QC metrics, as is to be expected, and Figure3.10 summarises the experimental design following QC.

p1 <- plotColData(
  sce,
  "sum",
  x = "genetic_donor",
  other_fields = c("capture", "genetic_donor"),
  colour_by = "genetic_donor",
  point_size = 0.5) +
  scale_y_log10() +
  scale_colour_manual(values = genetic_donor_colours, name = "genetic_donor") +
  theme(axis.text.x = element_blank()) +
  annotation_logticks(
    sides = "l",
    short = unit(0.03, "cm"),
    mid = unit(0.06, "cm"),
    long = unit(0.09, "cm"))
p2 <- plotColData(
  sce,
  "detected",
  x = "genetic_donor",
  other_fields = c("capture", "genetic_donor"),
  colour_by = "genetic_donor",
  point_size = 0.5) +
  scale_colour_manual(values = genetic_donor_colours, name = "genetic_donor") +
  theme(axis.text.x = element_blank())
p3 <- plotColData(
  sce,
  "subsets_Mito_percent",
  x = "genetic_donor",
  other_fields = c("capture", "genetic_donor"),
  colour_by = "genetic_donor",
  point_size = 0.5) +
  scale_colour_manual(values = genetic_donor_colours, name = "genetic_donor") +
  theme(axis.text.x = element_blank())
p4 <- plotColData(
  sce,
  "subsets_Ribo_percent",
  x = "genetic_donor",
  other_fields = c("capture", "genetic_donor"),
  colour_by = "genetic_donor",
  point_size = 0.5) +
  scale_colour_manual(values = genetic_donor_colours, name = "genetic_donor") +
  theme(axis.text.x = element_blank())
p1 + p2 + p3 + p4 + plot_layout(guides = "collect", ncol = 2)
Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

Figure 3.9: Distributions of various QC metrics for all cells in the dataset passing QC. This includes the library sizes and proportion of reads mapped to mitochondrial genes.

p1 <- ggcells(sce) + 
  geom_bar(
    aes(x = genetic_donor, fill = HTO),
    position = position_fill(reverse = TRUE)) +
  coord_flip() +
  ylab("Frequency") +
  theme_cowplot(font_size = 10) + 
  scale_fill_manual(values = hto_colours)
p2 <- ggcells(sce) + 
  geom_bar(
    aes(x = genetic_donor, fill = capture),
    position = position_fill(reverse = TRUE)) +
  coord_flip() +
  ylab("Frequency") +
  theme_cowplot(font_size = 10) + 
  scale_fill_manual(values = capture_colours)
p3 <- ggcells(sce) + 
  geom_bar(aes(x = genetic_donor, fill = genetic_donor)) + 
  coord_flip() + 
  ylab("Number of droplets") + 
  theme_cowplot(font_size = 10) + 
  scale_fill_manual(values = genetic_donor_colours) +
  geom_text(stat='count', aes(x = genetic_donor, label=..count..), hjust=1.5, size=2) +
  guides(fill = FALSE)
p1 / p2 / p3 + plot_layout(guides = "collect")
Breakdown of the samples following QC.

Figure 3.10: Breakdown of the samples following QC.

4 Identify further empty droplets using DropletQC

DropletQC calculates, for every requested cell barcode in a provided scRNA-seq BAM file, the nuclear fraction score:

nuclear fraction = intronic reads / (intronic reads + exonic reads)

The score captures the proportion of reads from intronic regions. These RNA fragments originate from unspliced (nuclear) pre-mRNA, hence the name “nuclear fraction”. This score can be used to help identify:

  1. “Empty” droplets containing ambient RNA: low nuclear fraction score and low UMI count
  2. Droplets containing damaged cells: high nuclear fraction score and low UMI count
out <- here("data/SCEs/03_C133_Neeland.nuclear_fraction_calls.rds")
capture_names <- levels(sce$capture)

if(!file.exists(out)){
folder <- here()

nf <- lapply(capture_names, function(cn){
  message(cn)
  
  bam <- file.path(folder, 
                   "data/CellRanger",
                   cn,
                   "possorted_genome_bam.bam")
  bai <- file.path(folder, 
                   "data/CellRanger",
                   cn,
                   "possorted_genome_bam.bam.bai")
  
  nuclear_fraction_tags(bam = bam,
                        bam_index = bai,
                        verbose = FALSE,
                        barcodes = sce[["Barcode"]][sce$Sample == cn])
})
names(nf) <- capture_names

dplyr::bind_rows(lapply(nf, 
                        tibble::rownames_to_column, 
                        var = "barcode")) %>%
  dplyr::mutate(cell = paste(rep(seq_along(nf), 
                                 vapply(nf, nrow, 1L)),
                             barcode, sep = "_")) -> nf_dat
  saveRDS(nf_dat, file = out)
          
} else {
  nf_dat <- readRDS(out)
  
}

4.1 Visualise the results

Figure 4.1 shows that a distinct population of additional empty droplets has not been identified by DropletQC. All potential empty droplets will be tagged but downstream removal is unlikely.

colData(sce) %>% 
  data.frame %>% 
  rownames_to_column(var = "cell") %>%
  inner_join(nf_dat %>% 
               dplyr::select(cell, nuclear_fraction)) %>%
  column_to_rownames(var = "cell") %>%
  DataFrame -> colData(sce)

ed <- lapply(capture_names, function(cn){
  identify_empty_drops(nf_umi = colData(sce) %>% data.frame %>%
                         dplyr::filter(Sample == cn) %>%
                         dplyr::select(nuclear_fraction, sum),
                       include_plot = TRUE,
                       umi_rescue = 10^2.5,
                       nf_rescue = 0.02)
})

Add DropletQC calls to colData.

colData(sce) %>% 
  data.frame %>% 
  rownames_to_column(var = "cell") %>%
  inner_join(ed %>% bind_rows %>%
               rownames_to_column(var = "cell")) %>%
  column_to_rownames(var = "cell") %>%
  DataFrame -> colData(sce)

p <- ggcells(sce, aes(x = nuclear_fraction, y = sum)) +
  aes(colour = cell_status) +
  geom_point(size = 1, alpha = 0.25) +
  scale_y_log10() +
  facet_wrap(~Sample) +
  theme(legend.position = "bottom")
p
DropletQC results across all captures.

Figure 4.1: DropletQC results across all captures.

plotColData(sce, x = "Sample", 
            y = "sum",
            colour_by = "cell_status") +
  scale_y_log10()

5 Save data

out <- here("data", "SCEs", "03_C133_Neeland.preprocessed.SCE.rds") 

if(!file.exists(out)) saveRDS(sce, out)

6 Session info

The analysis and this document were prepared using the following software (click triangle to expand)
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.1.0 (2021-05-18)
 os       CentOS Linux 7 (Core)
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_AU.UTF-8
 ctype    en_AU.UTF-8
 tz       Australia/Melbourne
 date     2022-06-16
 pandoc   2.17.1.1 @ /usr/lib/rstudio-server/bin/quarto/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 ! package                           * version    date (UTC) lib source
 P abind                               1.4-5      2016-07-21 [?] CRAN (R 4.1.0)
 P AnnotationDbi                     * 1.56.2     2021-11-09 [?] Bioconductor
 P AnnotationFilter                  * 1.18.0     2021-10-26 [?] Bioconductor
 P assertthat                          0.2.1      2019-03-21 [?] CRAN (R 4.1.0)
 P babelgene                           21.4       2021-04-26 [?] CRAN (R 4.1.0)
 P backports                           1.4.1      2021-12-13 [?] CRAN (R 4.1.0)
 P beachmat                            2.10.0     2021-10-26 [?] Bioconductor
 P beeswarm                            0.4.0      2021-06-01 [?] CRAN (R 4.1.0)
 P Biobase                           * 2.54.0     2021-10-26 [?] Bioconductor
 P BiocFileCache                       2.2.0      2021-10-26 [?] Bioconductor
 P BiocGenerics                      * 0.40.0     2021-10-26 [?] Bioconductor
 P BiocIO                              1.4.0      2021-10-26 [?] Bioconductor
 P BiocManager                         1.30.16    2021-06-15 [?] CRAN (R 4.1.0)
 P BiocNeighbors                       1.12.0     2021-10-26 [?] Bioconductor
 P BiocParallel                        1.28.3     2021-12-09 [?] Bioconductor
 P BiocSingular                        1.10.0     2021-10-26 [?] Bioconductor
 P BiocStyle                         * 2.22.0     2021-10-26 [?] Bioconductor
 P biomaRt                             2.50.1     2021-11-21 [?] Bioconductor
 P Biostrings                          2.62.0     2021-10-26 [?] Bioconductor
 P bit                                 4.0.4      2020-08-04 [?] CRAN (R 4.1.0)
 P bit64                               4.0.5      2020-08-30 [?] CRAN (R 4.0.2)
 P bitops                              1.0-7      2021-04-24 [?] CRAN (R 4.0.2)
 P blob                                1.2.2      2021-07-23 [?] CRAN (R 4.1.0)
 P bluster                             1.4.0      2021-10-26 [?] Bioconductor
 P bookdown                            0.24       2021-09-02 [?] CRAN (R 4.1.0)
 P broom                               0.7.11     2022-01-03 [?] CRAN (R 4.1.0)
 P bslib                               0.3.1      2021-10-06 [?] CRAN (R 4.1.0)
 P cachem                              1.0.6      2021-08-19 [?] CRAN (R 4.1.0)
 P callr                               3.7.0      2021-04-20 [?] CRAN (R 4.1.0)
 P car                                 3.0-12     2021-11-06 [?] CRAN (R 4.1.0)
 P carData                             3.0-5      2022-01-06 [?] CRAN (R 4.1.0)
 P cellranger                          1.1.0      2016-07-27 [?] CRAN (R 4.1.0)
 P cli                                 3.1.0      2021-10-27 [?] CRAN (R 4.1.0)
 P cluster                             2.1.2      2021-04-17 [?] CRAN (R 4.1.0)
 P colorspace                          2.0-2      2021-06-24 [?] CRAN (R 4.0.2)
 P cowplot                           * 1.1.1      2020-12-30 [?] CRAN (R 4.0.2)
 P crayon                              1.4.2      2021-10-29 [?] CRAN (R 4.1.0)
 P curl                                4.3.2      2021-06-23 [?] CRAN (R 4.1.0)
 P DBI                                 1.1.2      2021-12-20 [?] CRAN (R 4.1.0)
 P dbplyr                              2.1.1      2021-04-06 [?] CRAN (R 4.1.0)
 P DelayedArray                        0.20.0     2021-10-26 [?] Bioconductor
 P DelayedMatrixStats                  1.16.0     2021-10-26 [?] Bioconductor
 P digest                              0.6.29     2021-12-01 [?] CRAN (R 4.1.0)
 P dplyr                             * 1.0.7      2021-06-18 [?] CRAN (R 4.1.0)
 P dqrng                               0.3.0      2021-05-01 [?] CRAN (R 4.1.0)
 P DropletQC                         * 0.0.0.9000 2021-08-06 [?] Github (powellgenomicslab/DropletQC@4409c95)
 P DropletUtils                      * 1.14.1     2021-11-08 [?] Bioconductor
 P edgeR                             * 3.36.0     2021-10-26 [?] Bioconductor
 P ellipsis                            0.3.2      2021-04-29 [?] CRAN (R 4.0.2)
 P EnsDb.Hsapiens.v86                * 2.99.0     2022-01-24 [?] Bioconductor
 P ensembldb                         * 2.18.2     2021-11-08 [?] Bioconductor
 P evaluate                            0.14       2019-05-28 [?] CRAN (R 4.0.2)
 P fansi                               1.0.0      2022-01-10 [?] CRAN (R 4.1.0)
 P farver                              2.1.0      2021-02-28 [?] CRAN (R 4.0.2)
 P fastmap                             1.1.0      2021-01-25 [?] CRAN (R 4.1.0)
 P filelock                            1.0.2      2018-10-05 [?] CRAN (R 4.1.0)
 P forcats                           * 0.5.1      2021-01-27 [?] CRAN (R 4.1.0)
 P fs                                  1.5.2      2021-12-08 [?] CRAN (R 4.1.0)
 P generics                            0.1.1      2021-10-25 [?] CRAN (R 4.1.0)
   GenomeInfoDb                      * 1.30.1     2022-01-30 [1] Bioconductor
 P GenomeInfoDbData                    1.2.7      2021-12-21 [?] Bioconductor
 P GenomicAlignments                   1.30.0     2021-10-26 [?] Bioconductor
 P GenomicFeatures                   * 1.46.3     2021-12-30 [?] Bioconductor
 P GenomicRanges                     * 1.46.1     2021-11-18 [?] Bioconductor
 P getPass                             0.2-2      2017-07-21 [?] CRAN (R 4.0.2)
 P ggbeeswarm                          0.6.0      2017-08-07 [?] CRAN (R 4.1.0)
 P ggplot2                           * 3.3.5      2021-06-25 [?] CRAN (R 4.0.2)
 P ggpubr                              0.4.0      2020-06-27 [?] CRAN (R 4.1.0)
 P ggrepel                             0.9.1      2021-01-15 [?] CRAN (R 4.1.0)
 P ggsignif                            0.6.3      2021-09-09 [?] CRAN (R 4.1.0)
 P git2r                               0.29.0     2021-11-22 [?] CRAN (R 4.1.0)
 P glue                              * 1.6.0      2021-12-17 [?] CRAN (R 4.1.0)
 P GO.db                             * 3.14.0     2021-12-21 [?] Bioconductor
 P graph                               1.72.0     2021-10-26 [?] Bioconductor
 P gridExtra                           2.3        2017-09-09 [?] CRAN (R 4.1.0)
 P gtable                              0.3.0      2019-03-25 [?] CRAN (R 4.1.0)
 P haven                               2.4.3      2021-08-04 [?] CRAN (R 4.1.0)
 P HDF5Array                           1.22.1     2021-11-14 [?] Bioconductor
 P here                              * 1.0.1      2020-12-13 [?] CRAN (R 4.0.2)
 P highr                               0.9        2021-04-16 [?] CRAN (R 4.1.0)
 P hms                                 1.1.1      2021-09-26 [?] CRAN (R 4.1.0)
 P Homo.sapiens                      * 1.3.1      2021-12-21 [?] Bioconductor
 P htmltools                           0.5.2      2021-08-25 [?] CRAN (R 4.1.0)
 P httpuv                              1.6.5      2022-01-05 [?] CRAN (R 4.1.0)
 P httr                                1.4.2      2020-07-20 [?] CRAN (R 4.1.0)
 P igraph                              1.2.11     2022-01-04 [?] CRAN (R 4.1.0)
 P IRanges                           * 2.28.0     2021-10-26 [?] Bioconductor
 P irlba                               2.3.5      2021-12-06 [?] CRAN (R 4.1.0)
 P janitor                           * 2.1.0      2021-01-05 [?] CRAN (R 4.0.2)
 P jquerylib                           0.1.4      2021-04-26 [?] CRAN (R 4.1.0)
 P jsonlite                            1.7.2      2020-12-09 [?] CRAN (R 4.0.2)
 P KEGGREST                            1.34.0     2021-10-26 [?] Bioconductor
 P KernSmooth                          2.23-20    2021-05-03 [?] CRAN (R 4.1.0)
 P knitr                               1.37       2021-12-16 [?] CRAN (R 4.1.0)
 P ks                                  1.13.3     2021-12-17 [?] CRAN (R 4.1.0)
 P labeling                            0.4.2      2020-10-20 [?] CRAN (R 4.0.2)
 P later                               1.3.0      2021-08-18 [?] CRAN (R 4.1.0)
 P lattice                             0.20-45    2021-09-22 [?] CRAN (R 4.1.0)
 P lazyeval                            0.2.2      2019-03-15 [?] CRAN (R 4.1.0)
 P lifecycle                           1.0.1      2021-09-24 [?] CRAN (R 4.1.0)
 P limma                             * 3.50.0     2021-10-26 [?] Bioconductor
 P locfit                              1.5-9.4    2020-03-25 [?] CRAN (R 4.1.0)
 P lubridate                           1.8.0      2021-10-07 [?] CRAN (R 4.1.0)
 P magrittr                            2.0.1      2020-11-17 [?] CRAN (R 4.0.2)
 P Matrix                              1.4-0      2021-12-08 [?] CRAN (R 4.1.0)
 P MatrixGenerics                    * 1.6.0      2021-10-26 [?] Bioconductor
 P matrixStats                       * 0.61.0     2021-09-17 [?] CRAN (R 4.1.0)
 P mclust                              5.4.9      2021-12-17 [?] CRAN (R 4.1.0)
 P memoise                             2.0.1      2021-11-26 [?] CRAN (R 4.1.0)
 P metapod                             1.2.0      2021-10-26 [?] Bioconductor
 P modelr                              0.1.8      2020-05-19 [?] CRAN (R 4.0.2)
 P msigdbr                           * 7.4.1      2021-05-05 [?] CRAN (R 4.1.0)
 P munsell                             0.5.0      2018-06-12 [?] CRAN (R 4.1.0)
 P mvtnorm                             1.1-3      2021-10-08 [?] CRAN (R 4.1.0)
 P org.Hs.eg.db                      * 3.14.0     2021-12-21 [?] Bioconductor
 P OrganismDbi                       * 1.36.0     2021-10-26 [?] Bioconductor
 P patchwork                         * 1.1.1      2020-12-17 [?] CRAN (R 4.0.2)
 P pillar                              1.6.4      2021-10-18 [?] CRAN (R 4.1.0)
 P pkgconfig                           2.0.3      2019-09-22 [?] CRAN (R 4.1.0)
 P png                                 0.1-7      2013-12-03 [?] CRAN (R 4.1.0)
 P pracma                              2.3.6      2021-12-07 [?] CRAN (R 4.1.0)
 P prettyunits                         1.1.1      2020-01-24 [?] CRAN (R 4.0.2)
 P processx                            3.5.2      2021-04-30 [?] CRAN (R 4.1.0)
 P progress                            1.2.2      2019-05-16 [?] CRAN (R 4.1.0)
 P promises                            1.2.0.1    2021-02-11 [?] CRAN (R 4.0.2)
 P ProtGenerics                        1.26.0     2021-10-26 [?] Bioconductor
 P ps                                  1.6.0      2021-02-28 [?] CRAN (R 4.1.0)
 P purrr                             * 0.3.4      2020-04-17 [?] CRAN (R 4.0.2)
 P R.methodsS3                         1.8.1      2020-08-26 [?] CRAN (R 4.0.2)
 P R.oo                                1.24.0     2020-08-26 [?] CRAN (R 4.0.2)
 P R.utils                             2.11.0     2021-09-26 [?] CRAN (R 4.1.0)
 P R6                                  2.5.1      2021-08-19 [?] CRAN (R 4.1.0)
 P rappdirs                            0.3.3      2021-01-31 [?] CRAN (R 4.0.2)
 P RBGL                                1.70.0     2021-10-26 [?] Bioconductor
 P Rcpp                                1.0.7      2021-07-07 [?] CRAN (R 4.1.0)
   RCurl                               1.98-1.6   2022-02-08 [1] CRAN (R 4.1.0)
 P readr                             * 2.1.1      2021-11-30 [?] CRAN (R 4.1.0)
 P readxl                              1.3.1      2019-03-13 [?] CRAN (R 4.1.0)
 P renv                                0.15.0-14  2022-01-10 [?] Github (rstudio/renv@a3b90eb)
 P reprex                              2.0.1      2021-08-05 [?] CRAN (R 4.1.0)
 P restfulr                            0.0.13     2017-08-06 [?] CRAN (R 4.1.0)
 P rhdf5                               2.38.0     2021-10-26 [?] Bioconductor
 P rhdf5filters                        1.6.0      2021-10-26 [?] Bioconductor
 P Rhdf5lib                            1.16.0     2021-10-26 [?] Bioconductor
 P rjson                               0.2.21     2022-01-09 [?] CRAN (R 4.1.0)
 P rlang                               0.4.12     2021-10-18 [?] CRAN (R 4.1.0)
 P rmarkdown                           2.11       2021-09-14 [?] CRAN (R 4.1.0)
 P rprojroot                           2.0.2      2020-11-15 [?] CRAN (R 4.0.2)
 P Rsamtools                           2.10.0     2021-10-26 [?] Bioconductor
 P RSQLite                             2.2.9      2021-12-06 [?] CRAN (R 4.1.0)
 P rstatix                             0.7.0      2021-02-13 [?] CRAN (R 4.0.2)
 P rstudioapi                          0.13       2020-11-12 [?] CRAN (R 4.0.2)
 P rsvd                                1.0.5      2021-04-16 [?] CRAN (R 4.1.0)
 P rtracklayer                         1.54.0     2021-10-26 [?] Bioconductor
 P rvest                               1.0.2      2021-10-16 [?] CRAN (R 4.1.0)
 P S4Vectors                         * 0.32.3     2021-11-21 [?] Bioconductor
 P sass                                0.4.0      2021-05-12 [?] CRAN (R 4.1.0)
 P ScaledMatrix                        1.2.0      2021-10-26 [?] Bioconductor
 P scales                            * 1.1.1      2020-05-11 [?] CRAN (R 4.0.2)
 P scater                            * 1.22.0     2021-10-26 [?] Bioconductor
 P scran                             * 1.22.1     2021-11-14 [?] Bioconductor
 P scuttle                           * 1.4.0      2021-10-26 [?] Bioconductor
 P sessioninfo                         1.2.2      2021-12-06 [?] CRAN (R 4.1.0)
 P SingleCellExperiment              * 1.16.0     2021-10-26 [?] Bioconductor
 P snakecase                           0.11.0     2019-05-25 [?] CRAN (R 4.0.2)
 P sparseMatrixStats                   1.6.0      2021-10-26 [?] Bioconductor
 P statmod                             1.4.36     2021-05-10 [?] CRAN (R 4.1.0)
 P stringi                             1.7.6      2021-11-29 [?] CRAN (R 4.1.0)
 P stringr                           * 1.4.0      2019-02-10 [?] CRAN (R 4.0.2)
 P SummarizedExperiment              * 1.24.0     2021-10-26 [?] Bioconductor
 P tibble                            * 3.1.6      2021-11-07 [?] CRAN (R 4.1.0)
 P tidyr                             * 1.1.4      2021-09-27 [?] CRAN (R 4.1.0)
 P tidyselect                          1.1.1      2021-04-30 [?] CRAN (R 4.1.0)
 P tidyverse                         * 1.3.1      2021-04-15 [?] CRAN (R 4.1.0)
 P TxDb.Hsapiens.UCSC.hg19.knownGene * 3.2.2      2022-01-10 [?] Bioconductor
 P tzdb                                0.2.0      2021-10-27 [?] CRAN (R 4.1.0)
 P utf8                                1.2.2      2021-07-24 [?] CRAN (R 4.1.0)
 P vctrs                               0.3.8      2021-04-29 [?] CRAN (R 4.0.2)
 P vipor                               0.4.5      2017-03-22 [?] CRAN (R 4.1.0)
 P viridis                             0.6.2      2021-10-13 [?] CRAN (R 4.1.0)
 P viridisLite                         0.4.0      2021-04-13 [?] CRAN (R 4.0.2)
 P whisker                             0.4        2019-08-28 [?] CRAN (R 4.0.2)
 P withr                               2.4.3      2021-11-30 [?] CRAN (R 4.1.0)
 P workflowr                         * 1.7.0      2021-12-21 [?] CRAN (R 4.1.0)
 P xfun                                0.29       2021-12-14 [?] CRAN (R 4.1.0)
 P XML                                 3.99-0.8   2021-09-17 [?] CRAN (R 4.1.0)
 P xml2                                1.3.3      2021-11-30 [?] CRAN (R 4.1.0)
 P XVector                             0.34.0     2021-10-26 [?] Bioconductor
 P yaml                                2.2.1      2020-02-01 [?] CRAN (R 4.0.2)
 P zlibbioc                            1.40.0     2021-10-26 [?] Bioconductor

 [1] /oshlack_lab/jovana.maksimovic/projects/MCRI/melanie.neeland/paed-cf-cite-seq/renv/library/R-4.1/x86_64-pc-linux-gnu
 [2] /config/binaries/R/4.1.0/lib64/R/library

 P ── Loaded and on-disk path mismatch.

──────────────────────────────────────────────────────────────────────────────

sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /config/binaries/R/4.1.0/lib64/R/lib/libRblas.so
LAPACK: /config/binaries/R/4.1.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices datasets  utils     methods  
[8] base     

other attached packages:
 [1] edgeR_3.36.0                           
 [2] limma_3.50.0                           
 [3] msigdbr_7.4.1                          
 [4] Homo.sapiens_1.3.1                     
 [5] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [6] org.Hs.eg.db_3.14.0                    
 [7] GO.db_3.14.0                           
 [8] OrganismDbi_1.36.0                     
 [9] EnsDb.Hsapiens.v86_2.99.0              
[10] ensembldb_2.18.2                       
[11] AnnotationFilter_1.18.0                
[12] GenomicFeatures_1.46.3                 
[13] AnnotationDbi_1.56.2                   
[14] DropletQC_0.0.0.9000                   
[15] scales_1.1.1                           
[16] patchwork_1.1.1                        
[17] cowplot_1.1.1                          
[18] janitor_2.1.0                          
[19] scater_1.22.0                          
[20] scran_1.22.1                           
[21] scuttle_1.4.0                          
[22] DropletUtils_1.14.1                    
[23] SingleCellExperiment_1.16.0            
[24] SummarizedExperiment_1.24.0            
[25] Biobase_2.54.0                         
[26] GenomicRanges_1.46.1                   
[27] GenomeInfoDb_1.30.1                    
[28] IRanges_2.28.0                         
[29] S4Vectors_0.32.3                       
[30] BiocGenerics_0.40.0                    
[31] MatrixGenerics_1.6.0                   
[32] matrixStats_0.61.0                     
[33] glue_1.6.0                             
[34] here_1.0.1                             
[35] forcats_0.5.1                          
[36] stringr_1.4.0                          
[37] dplyr_1.0.7                            
[38] purrr_0.3.4                            
[39] readr_2.1.1                            
[40] tidyr_1.1.4                            
[41] tibble_3.1.6                           
[42] ggplot2_3.3.5                          
[43] tidyverse_1.3.1                        
[44] BiocStyle_2.22.0                       
[45] workflowr_1.7.0                        

loaded via a namespace (and not attached):
  [1] utf8_1.2.2                ks_1.13.3                
  [3] R.utils_2.11.0            tidyselect_1.1.1         
  [5] RSQLite_2.2.9             grid_4.1.0               
  [7] BiocParallel_1.28.3       munsell_0.5.0            
  [9] ScaledMatrix_1.2.0        statmod_1.4.36           
 [11] withr_2.4.3               colorspace_2.0-2         
 [13] filelock_1.0.2            highr_0.9                
 [15] knitr_1.37                rstudioapi_0.13          
 [17] ggsignif_0.6.3            labeling_0.4.2           
 [19] git2r_0.29.0              GenomeInfoDbData_1.2.7   
 [21] farver_2.1.0              bit64_4.0.5              
 [23] rhdf5_2.38.0              rprojroot_2.0.2          
 [25] vctrs_0.3.8               generics_0.1.1           
 [27] xfun_0.29                 BiocFileCache_2.2.0      
 [29] R6_2.5.1                  ggbeeswarm_0.6.0         
 [31] rsvd_1.0.5                locfit_1.5-9.4           
 [33] bitops_1.0-7              rhdf5filters_1.6.0       
 [35] cachem_1.0.6              DelayedArray_0.20.0      
 [37] assertthat_0.2.1          promises_1.2.0.1         
 [39] BiocIO_1.4.0              beeswarm_0.4.0           
 [41] gtable_0.3.0              beachmat_2.10.0          
 [43] processx_3.5.2            rlang_0.4.12             
 [45] rstatix_0.7.0             lazyeval_0.2.2           
 [47] rtracklayer_1.54.0        broom_0.7.11             
 [49] abind_1.4-5               BiocManager_1.30.16      
 [51] yaml_2.2.1                modelr_0.1.8             
 [53] backports_1.4.1           httpuv_1.6.5             
 [55] RBGL_1.70.0               tools_4.1.0              
 [57] bookdown_0.24             ellipsis_0.3.2           
 [59] jquerylib_0.1.4           sessioninfo_1.2.2        
 [61] Rcpp_1.0.7                sparseMatrixStats_1.6.0  
 [63] progress_1.2.2            zlibbioc_1.40.0          
 [65] RCurl_1.98-1.6            ps_1.6.0                 
 [67] prettyunits_1.1.1         ggpubr_0.4.0             
 [69] viridis_0.6.2             haven_2.4.3              
 [71] ggrepel_0.9.1             cluster_2.1.2            
 [73] fs_1.5.2                  magrittr_2.0.1           
 [75] reprex_2.0.1              mvtnorm_1.1-3            
 [77] whisker_0.4               ProtGenerics_1.26.0      
 [79] hms_1.1.1                 evaluate_0.14            
 [81] XML_3.99-0.8              mclust_5.4.9             
 [83] readxl_1.3.1              gridExtra_2.3            
 [85] compiler_4.1.0            biomaRt_2.50.1           
 [87] KernSmooth_2.23-20        crayon_1.4.2             
 [89] R.oo_1.24.0               htmltools_0.5.2          
 [91] later_1.3.0               tzdb_0.2.0               
 [93] lubridate_1.8.0           DBI_1.1.2                
 [95] dbplyr_2.1.1              rappdirs_0.3.3           
 [97] babelgene_21.4            car_3.0-12               
 [99] Matrix_1.4-0              cli_3.1.0                
[101] R.methodsS3_1.8.1         parallel_4.1.0           
[103] metapod_1.2.0             igraph_1.2.11            
[105] pkgconfig_2.0.3           getPass_0.2-2            
[107] GenomicAlignments_1.30.0  xml2_1.3.3               
[109] vipor_0.4.5               bslib_0.3.1              
[111] dqrng_0.3.0               XVector_0.34.0           
[113] rvest_1.0.2               snakecase_0.11.0         
[115] callr_3.7.0               digest_0.6.29            
[117] pracma_2.3.6              graph_1.72.0             
[119] Biostrings_2.62.0         rmarkdown_2.11           
[121] cellranger_1.1.0          DelayedMatrixStats_1.16.0
[123] restfulr_0.0.13           curl_4.3.2               
[125] Rsamtools_2.10.0          rjson_0.2.21             
[127] lifecycle_1.0.1           jsonlite_1.7.2           
[129] Rhdf5lib_1.16.0           carData_3.0-5            
[131] BiocNeighbors_1.12.0      viridisLite_0.4.0        
[133] fansi_1.0.0               pillar_1.6.4             
[135] lattice_0.20-45           KEGGREST_1.34.0          
[137] fastmap_1.1.0             httr_1.4.2               
[139] png_0.1-7                 bluster_1.4.0            
[141] bit_4.0.4                 stringi_1.7.6            
[143] sass_0.4.0                HDF5Array_1.22.1         
[145] blob_1.2.2                BiocSingular_1.10.0      
[147] memoise_2.0.1             renv_0.15.0-14           
[149] irlba_2.3.5              

  1. Some care is taken to account for missing and duplicate gene symbols; missing symbols are replaced with the Ensembl identifier and duplicated symbols are concatenated with the (unique) Ensembl identifiers.↩︎

  2. The number of expressed features refers to the number of genes which have non-zero counts (i.e. they have been identified in the cell at least once)↩︎

  3. This is consistent with the use of UMI counts rather than read counts, as each transcript molecule can only produce one UMI count but can yield many reads after fragmentation.↩︎

  4. It is important to note that we only using droplets assigned to a sample (i.e. we ignore unassigned droplets) for the calculation of these thresholds.↩︎