Last updated: 2023-07-05
Checks: 6 1
Knit directory: hashtag-demux-paper/
This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20230522)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.
absolute | relative |
---|---|
~/singlecell/hashtag-demux-paper/data/BAL_data/batch1_c1_hashsolo.csv | data/BAL_data/batch1_c1_hashsolo.csv |
~/singlecell/hashtag-demux-paper/data/BAL_data/batch1_c2_hashsolo.csv | data/BAL_data/batch1_c2_hashsolo.csv |
~/singlecell/hashtag-demux-paper/data/BAL_data/batch2_c1_hashsolo.csv | data/BAL_data/batch2_c1_hashsolo.csv |
~/singlecell/hashtag-demux-paper/data/BAL_data/batch2_c2_hashsolo.csv | data/BAL_data/batch2_c2_hashsolo.csv |
~/singlecell/hashtag-demux-paper/data/BAL_data/batch3_c1_hashsolo.csv | data/BAL_data/batch3_c1_hashsolo.csv |
~/singlecell/hashtag-demux-paper/data/BAL_data/batch3_c2_hashsolo.csv | data/BAL_data/batch3_c2_hashsolo.csv |
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version b8cab8b. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: data/BAL_data/.DS_Store
Untracked files:
Untracked: .ipynb_checkpoints/
Untracked: Dear editors.docx
Untracked: NAR.cls
Untracked: analysis/Fscore_MCC_comparison.Rmd
Untracked: analysis/cell_line_analysis.Rmd
Untracked: analysis/gmm_demux.Rmd
Untracked: analysis/hashsolo_prep.Rmd
Untracked: analysis/run_GMM_demux_BAL.sh
Untracked: analysis/solid_tissue_analysis.Rmd
Untracked: cover_letter.docx
Untracked: data/BAL_data/batch1_all_methods.SEU.rds
Untracked: data/BAL_data/batch1_c1_donors_original.csv
Untracked: data/BAL_data/batch1_c1_hto_counts_original.csv
Untracked: data/BAL_data/batch1_c1_relabelled.SEU.rds
Untracked: data/BAL_data/batch1_c2_donors_original.csv
Untracked: data/BAL_data/batch1_c2_hto_counts_original.csv
Untracked: data/BAL_data/batch1_c2_relabelled.SEU.rds
Untracked: data/BAL_data/batch1_relabelled.SEU.rds
Untracked: data/BAL_data/batch2_all_methods.SEU.rds
Untracked: data/BAL_data/batch2_c1_donors_original.csv
Untracked: data/BAL_data/batch2_c1_hto_counts_original.csv
Untracked: data/BAL_data/batch2_c1_relabelled.SEU.rds
Untracked: data/BAL_data/batch2_c2_donors_original.csv
Untracked: data/BAL_data/batch2_c2_hto_counts_original.csv
Untracked: data/BAL_data/batch2_c2_relabelled.SEU.rds
Untracked: data/BAL_data/batch2_relabelled.SEU.rds
Untracked: data/BAL_data/batch3_all_methods.SEU.rds
Untracked: data/BAL_data/batch3_c1_donors_original.csv
Untracked: data/BAL_data/batch3_c1_hto_counts_original.csv
Untracked: data/BAL_data/batch3_c1_relabelled.SEU.rds
Untracked: data/BAL_data/batch3_c2_donors_original.csv
Untracked: data/BAL_data/batch3_c2_hto_counts_original.csv
Untracked: data/BAL_data/batch3_c2_relabelled.SEU.rds
Untracked: data/BAL_data/batch3_relabelled.SEU.rds
Untracked: data/adata/batch1_c1_hashsolo.csv
Untracked: data/adata/batch1_c1_hs_n10_d10.csv
Untracked: data/adata/batch1_c1_hs_n10_d20.csv
Untracked: data/adata/batch1_c1_hs_n10_d30.csv
Untracked: data/adata/batch1_c1_hs_n1_d10.csv
Untracked: data/adata/batch1_c1_hs_n1_d20.csv
Untracked: data/adata/batch1_c1_hs_n1_d30.csv
Untracked: data/adata/batch1_c1_hs_n5_d10.csv
Untracked: data/adata/batch1_c1_hs_n5_d30.csv
Untracked: data/adata/batch1_c2_hashsolo.csv
Untracked: data/adata/batch1_c2_hs_n10_d10.csv
Untracked: data/adata/batch1_c2_hs_n10_d20.csv
Untracked: data/adata/batch1_c2_hs_n10_d30.csv
Untracked: data/adata/batch1_c2_hs_n1_d10.csv
Untracked: data/adata/batch1_c2_hs_n1_d20.csv
Untracked: data/adata/batch1_c2_hs_n1_d30.csv
Untracked: data/adata/batch1_c2_hs_n5_d10.csv
Untracked: data/adata/batch1_c2_hs_n5_d30.csv
Untracked: data/adata/batch2_c1_hashsolo.csv
Untracked: data/adata/batch2_c1_hs_n10_d10.csv
Untracked: data/adata/batch2_c1_hs_n10_d20.csv
Untracked: data/adata/batch2_c1_hs_n10_d30.csv
Untracked: data/adata/batch2_c1_hs_n1_d10.csv
Untracked: data/adata/batch2_c1_hs_n1_d20.csv
Untracked: data/adata/batch2_c1_hs_n1_d30.csv
Untracked: data/adata/batch2_c1_hs_n5_d10.csv
Untracked: data/adata/batch2_c1_hs_n5_d30.csv
Untracked: data/adata/batch2_c2_hashsolo.csv
Untracked: data/adata/batch2_c2_hs_n10_d10.csv
Untracked: data/adata/batch2_c2_hs_n10_d20.csv
Untracked: data/adata/batch2_c2_hs_n10_d30.csv
Untracked: data/adata/batch2_c2_hs_n1_d10.csv
Untracked: data/adata/batch2_c2_hs_n1_d20.csv
Untracked: data/adata/batch2_c2_hs_n1_d30.csv
Untracked: data/adata/batch2_c2_hs_n5_d10.csv
Untracked: data/adata/batch2_c2_hs_n5_d30.csv
Untracked: data/adata/batch3_c1_hashsolo.csv
Untracked: data/adata/batch3_c1_hs_n10_d10.csv
Untracked: data/adata/batch3_c1_hs_n10_d20.csv
Untracked: data/adata/batch3_c1_hs_n10_d30.csv
Untracked: data/adata/batch3_c1_hs_n1_d10.csv
Untracked: data/adata/batch3_c1_hs_n1_d20.csv
Untracked: data/adata/batch3_c1_hs_n1_d30.csv
Untracked: data/adata/batch3_c1_hs_n5_d10.csv
Untracked: data/adata/batch3_c1_hs_n5_d30.csv
Untracked: data/adata/batch3_c2_hashsolo.csv
Untracked: data/adata/batch3_c2_hs_n10_d10.csv
Untracked: data/adata/batch3_c2_hs_n10_d20.csv
Untracked: data/adata/batch3_c2_hs_n10_d30.csv
Untracked: data/adata/batch3_c2_hs_n1_d10.csv
Untracked: data/adata/batch3_c2_hs_n1_d20.csv
Untracked: data/adata/batch3_c2_hs_n1_d30.csv
Untracked: data/adata/batch3_c2_hs_n5_d10.csv
Untracked: data/adata/batch3_c2_hs_n5_d30.csv
Untracked: data/adata/solid_tissue_batch1_hashsolo.csv
Untracked: data/adata/solid_tissue_batch2_hashsolo.csv
Untracked: data/solid_tumor_data/
Untracked: figures/QC_plots_new.png
Untracked: figures/Users/
Untracked: filter_wrong_empties.Rmd
Untracked: iscb_long_abstract.docx
Untracked: iscb_long_abstract.pdf
Untracked: oup-authoring-template/
Untracked: output/mean_fscore_mcc.xlsx
Untracked: paper_latex/
Unstaged changes:
Modified: analysis/index.Rmd
Modified: data/.DS_Store
Deleted: data/GMM-Demux/SSD_mtx/barcodes.tsv.gz
Deleted: data/GMM-Demux/SSD_mtx/features.tsv.gz
Deleted: data/GMM-Demux/SSD_mtx/matrix.mtx.gz
Deleted: data/GMM-Demux/batch1_c1_hto_counts_transpose.csv
Deleted: data/GMM-Demux/batch1_c2_hto_counts_transpose.csv
Deleted: data/GMM-Demux/batch2_c1_hto_counts_transpose.csv
Deleted: data/GMM-Demux/batch2_c2_hto_counts_transpose.csv
Deleted: data/GMM-Demux/batch3_c1_hto_counts_transpose.csv
Deleted: data/GMM-Demux/batch3_c2_hto_counts_transpose.csv
Deleted: data/GMM-Demux/gmm_out_LMO_c1/full_report/GMM_full.config
Deleted: data/GMM-Demux/gmm_out_LMO_c1/full_report/GMM_full.csv
Deleted: data/GMM-Demux/gmm_out_LMO_c1/simplified_report/GMM_simplified.config
Deleted: data/GMM-Demux/gmm_out_LMO_c1/simplified_report/GMM_simplified.csv
Deleted: data/GMM-Demux/gmm_out_LMO_c2/full_report/GMM_full.config
Deleted: data/GMM-Demux/gmm_out_LMO_c2/full_report/GMM_full.csv
Deleted: data/GMM-Demux/gmm_out_LMO_c2/simplified_report/GMM_simplified.config
Deleted: data/GMM-Demux/gmm_out_LMO_c2/simplified_report/GMM_simplified.csv
Deleted: data/GMM-Demux/gmm_out_LMO_c3/full_report/GMM_full.config
Deleted: data/GMM-Demux/gmm_out_LMO_c3/full_report/GMM_full.csv
Deleted: data/GMM-Demux/gmm_out_LMO_c3/simplified_report/GMM_simplified.config
Deleted: data/GMM-Demux/gmm_out_LMO_c3/simplified_report/GMM_simplified.csv
Deleted: data/GMM-Demux/gmm_out_batch1_c1/full_report/GMM_full.config
Deleted: data/GMM-Demux/gmm_out_batch1_c1/full_report/GMM_full.csv
Deleted: data/GMM-Demux/gmm_out_batch1_c1/simplified_report/GMM_simplified.config
Deleted: data/GMM-Demux/gmm_out_batch1_c1/simplified_report/GMM_simplified.csv
Deleted: data/GMM-Demux/gmm_out_batch1_c2/full_report/GMM_full.config
Deleted: data/GMM-Demux/gmm_out_batch1_c2/full_report/GMM_full.csv
Deleted: data/GMM-Demux/gmm_out_batch1_c2/simplified_report/GMM_simplified.config
Deleted: data/GMM-Demux/gmm_out_batch1_c2/simplified_report/GMM_simplified.csv
Deleted: data/GMM-Demux/gmm_out_batch2_c1/full_report/GMM_full.config
Deleted: data/GMM-Demux/gmm_out_batch2_c1/full_report/GMM_full.csv
Deleted: data/GMM-Demux/gmm_out_batch2_c1/simplified_report/GMM_simplified.config
Deleted: data/GMM-Demux/gmm_out_batch2_c1/simplified_report/GMM_simplified.csv
Deleted: data/GMM-Demux/gmm_out_batch2_c2/full_report/GMM_full.config
Deleted: data/GMM-Demux/gmm_out_batch2_c2/full_report/GMM_full.csv
Deleted: data/GMM-Demux/gmm_out_batch2_c2/simplified_report/GMM_simplified.config
Deleted: data/GMM-Demux/gmm_out_batch2_c2/simplified_report/GMM_simplified.csv
Deleted: data/GMM-Demux/gmm_out_batch3_c1/full_report/GMM_full.config
Deleted: data/GMM-Demux/gmm_out_batch3_c1/full_report/GMM_full.csv
Deleted: data/GMM-Demux/gmm_out_batch3_c1/simplified_report/GMM_simplified.config
Deleted: data/GMM-Demux/gmm_out_batch3_c1/simplified_report/GMM_simplified.csv
Deleted: data/GMM-Demux/gmm_out_batch3_c2/full_report/GMM_full.config
Deleted: data/GMM-Demux/gmm_out_batch3_c2/full_report/GMM_full.csv
Deleted: data/GMM-Demux/gmm_out_batch3_c2/simplified_report/GMM_simplified.config
Deleted: data/GMM-Demux/gmm_out_batch3_c2/simplified_report/GMM_simplified.csv
Deleted: data/GMM-Demux/lmo_counts_capture1_transpose.csv
Deleted: data/GMM-Demux/lmo_counts_capture2_transpose.csv
Deleted: data/GMM-Demux/lmo_counts_capture3_transpose.csv
Deleted: data/GMM-Demux/run_GMM_demux_BAL.sh
Deleted: data/GMM-Demux/run_GMM_demux_LMO.sh
Modified: data/adata/batch1_HTOs.csv
Modified: data/adata/batch1_c1_barcodes.csv
Modified: data/adata/batch1_c1_counts.mtx
Modified: data/adata/batch1_c2_barcodes.csv
Modified: data/adata/batch1_c2_counts.mtx
Modified: data/adata/batch2_HTOs.csv
Modified: data/adata/batch2_c1_barcodes.csv
Modified: data/adata/batch2_c1_counts.mtx
Modified: data/adata/batch2_c2_barcodes.csv
Modified: data/adata/batch2_c2_counts.mtx
Modified: data/adata/batch3_HTOs.csv
Modified: data/adata/batch3_c1_barcodes.csv
Modified: data/adata/batch3_c1_counts.mtx
Modified: data/adata/batch3_c2_barcodes.csv
Modified: data/adata/batch3_c2_counts.mtx
Deleted: data/batch1_c1_donors.csv
Deleted: data/batch1_c1_hto_counts.csv
Deleted: data/batch1_c2_donors.csv
Deleted: data/batch1_c2_hto_counts.csv
Deleted: data/batch1_hto_counts.csv
Deleted: data/batch2_c1_donors.csv
Deleted: data/batch2_c1_hto_counts.csv
Deleted: data/batch2_c2_donors.csv
Deleted: data/batch2_c2_hto_counts.csv
Deleted: data/batch2_hto_counts.csv
Deleted: data/batch3_c1_donors.csv
Deleted: data/batch3_c1_hto_counts.csv
Deleted: data/batch3_c2_donors.csv
Deleted: data/batch3_c2_hto_counts.csv
Deleted: data/batch3_hto_counts.csv
Deleted: data/lmo_counts.csv
Deleted: data/lmo_counts_capture1.csv
Deleted: data/lmo_counts_capture2.csv
Deleted: data/lmo_counts_capture3.csv
Deleted: data/lmo_donors.csv
Deleted: data/lmo_donors_capture1.csv
Deleted: data/lmo_donors_capture2.csv
Deleted: data/lmo_donors_capture3.csv
Modified: figures/QC_plots.png
Modified: figures/category_fractions.png
Modified: hashsolo_calls.ipynb
Modified: notebook_for_paper.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/BAL_analysis.Rmd
) and HTML
(docs/BAL_analysis.html
) files. If you’ve configured a
remote Git repository (see ?wflow_git_remote
), click on the
hyperlinks in the table below to view the files as they were in that
past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | b8cab8b | George Howitt | 2023-07-05 | wflow_publish("analysis/BAL_analysis.Rmd") |
#Load libraries
suppressPackageStartupMessages({
library(here)
library(BiocStyle)
library(dplyr)
library(janitor)
library(ggplot2)
library(cowplot)
library(patchwork)
library(DropletUtils)
library(tidyverse)
library(scuttle)
library(scater)
library(Seurat)
library(pheatmap)
library(speckle)
library(dittoSeq)
library(cellhashR)
library(RColorBrewer)
library(demuxmix)
library(ComplexHeatmap)
library(tidyHeatmap)
library(viridis)
})
This notebook contains the analysis code for all results and figures relating to the BAL data set in the paper “Benchmarking single-cell hashtag oligo demultiplexing methods”.
The data set consists of three batches with eight genetically distinct samples in each. Each batch was processed in two separate captures. We run all the demultiplexing methods on a per-capture level before then recombining into batch level objects for later analysis.
Load in counts matrices and genetic IDs.
batch1_c1_counts <- read.csv(here("data", "BAL_data", "batch1_c1_hto_counts.csv"),
check.names = FALSE, row.names = 1)
batch1_c2_counts <- read.csv(here("data", "BAL_data", "batch1_c2_hto_counts.csv"),
check.names = FALSE, row.names = 1)
batch2_c1_counts <- read.csv(here("data", "BAL_data", "batch2_c1_hto_counts.csv"),
check.names = FALSE, row.names = 1)
batch2_c2_counts <- read.csv(here("data", "BAL_data", "batch2_c2_hto_counts.csv"),
check.names = FALSE, row.names = 1)
batch3_c1_counts <- read.csv(here("data", "BAL_data", "batch3_c1_hto_counts.csv"),
check.names = FALSE, row.names = 1)
batch3_c2_counts <- read.csv(here("data", "BAL_data", "batch3_c2_hto_counts.csv"),
check.names = FALSE, row.names = 1)
batch1_c1_donors <- read.csv(here("data", "BAL_data", "batch1_c1_donors.csv"), row.names = 1)
batch1_c2_donors <- read.csv(here("data", "BAL_data", "batch1_c2_donors.csv"), row.names = 1)
batch2_c1_donors <- read.csv(here("data", "BAL_data", "batch2_c1_donors.csv"), row.names = 1)
batch2_c2_donors <- read.csv(here("data", "BAL_data", "batch2_c2_donors.csv"), row.names = 1)
batch3_c1_donors <- read.csv(here("data", "BAL_data", "batch3_c1_donors.csv"), row.names = 1)
batch3_c2_donors <- read.csv(here("data", "BAL_data", "batch3_c2_donors.csv"), row.names = 1)
Make Seurat objects
seu1_c1 <- CreateSeuratObject(counts = batch1_c1_counts, assay = "HTO")
seu1_c2 <- CreateSeuratObject(counts = batch1_c2_counts, assay = "HTO")
seu2_c1 <- CreateSeuratObject(counts = batch2_c1_counts, assay = "HTO")
seu2_c2 <- CreateSeuratObject(counts = batch2_c2_counts, assay = "HTO")
seu3_c1 <- CreateSeuratObject(counts = batch3_c1_counts, assay = "HTO")
seu3_c2 <- CreateSeuratObject(counts = batch3_c2_counts, assay = "HTO")
seu1_c1$Barcode <- colnames(seu1_c1)
seu1_c1$capture <- "capture_1"
seu1_c2$Barcode <- colnames(seu1_c2)
seu1_c2$capture <- "capture_2"
seu2_c1$Barcode <- colnames(seu2_c1)
seu2_c1$capture <- "capture_1"
seu2_c2$Barcode <- colnames(seu2_c2)
seu2_c2$capture <- "capture_2"
seu3_c1$Barcode <- colnames(seu3_c1)
seu3_c1$capture <- "capture_1"
seu3_c2$Barcode <- colnames(seu3_c2)
seu3_c2$capture <- "capture_2"
Add genetic donor information to Seurat objects
seu1_c1$genetic_donor <- batch1_c1_donors$genetic_donor
seu1_c2$genetic_donor <- batch1_c2_donors$genetic_donor
seu2_c1$genetic_donor <- batch2_c1_donors$genetic_donor
seu2_c2$genetic_donor <- batch2_c2_donors$genetic_donor
seu3_c1$genetic_donor <- batch3_c1_donors$genetic_donor
seu3_c2$genetic_donor <- batch3_c2_donors$genetic_donor
Merge into batch-level objects
seu1 <- merge(seu1_c1, seu1_c2)
seu2 <- merge(seu2_c1, seu2_c2)
seu3 <- merge(seu3_c1, seu3_c2)
For each of the batches need lists associating the HTOs with the genetic donors.
HTO_list_batch1 <- c("BAL 01", "BAL 02", "BAL 03", "BAL 04", "BAL 05",
"BAL 06", "BAL 07", "BAL 08", "Doublet", "Negative")
donor_list_batch1 <- list("BAL A" = "BAL 01",
"BAL B" = "BAL 02",
"BAL C" = "BAL 03",
"BAL D" = "BAL 04",
"BAL E" = "BAL 05",
"BAL F" = "BAL 06",
"BAL G" = "BAL 07",
"BAL H" = "BAL 08",
"Doublet" = "Doublet",
"Negative" = "Negative")
HTO_donor_list_batch1 <- list("BAL 01" = "BAL A",
"BAL 02" = "BAL B",
"BAL 03" = "BAL C",
"BAL 04" = "BAL D",
"BAL 05" = "BAL E",
"BAL 06" = "BAL F",
"BAL 07" = "BAL G",
"BAL 08" = "BAL H",
"Doublet" = "Doublet",
"Negative" = "Negative")
HTO_list_batch2 <- c("BAL 09", "BAL 10", "BAL 11", "BAL 12", "BAL 13",
"BAL 14", "BAL 15", "BAL 16", "Doublet", "Negative")
donor_list_batch2 <- list("BAL I" = "BAL 09",
"BAL J" = "BAL 10",
"BAL K" = "BAL 11",
"BAL L" = "BAL 12",
"BAL M" = "BAL 13",
"BAL N" = "BAL 14",
"BAL O" = "BAL 15",
"BAL P" = "BAL 16",
"Doublet" = "Doublet",
"Negative" = "Negative")
HTO_donor_list_batch2 <- list("BAL 09" = "BAL I",
"BAL 10" = "BAL J",
"BAL 11" = "BAL K",
"BAL 12" = "BAL L",
"BAL 13" = "BAL M",
"BAL 14" = "BAL N",
"BAL 15" = "BAL O",
"BAL 16" = "BAL P",
"Doublet" = "Doublet",
"Negative" = "Negative")
HTO_list_batch3 <- c("BAL 17", "BAL 18", "BAL 19", "BAL 20", "BAL 21",
"BAL 22", "BAL 23", "BAL 24", "Doublet", "Negative")
donor_list_batch3 <- list("BAL Q" = "BAL 17",
"BAL R" = "BAL 18",
"BAL S" = "BAL 19",
"BAL T" = "BAL 20",
"BAL U" = "BAL 21",
"BAL V" = "BAL 22",
"BAL W" = "BAL 23",
"BAL X" = "BAL 24",
"Doublet" = "Doublet",
"Negative" = "Negative")
HTO_donor_list_batch3 <- list("BAL 17" = "BAL Q",
"BAL 18" = "BAL R",
"BAL 19" = "BAL S",
"BAL 20" = "BAL T",
"BAL 21" = "BAL U",
"BAL 22" = "BAL V",
"BAL 23" = "BAL W",
"BAL 24" = "BAL X",
"Doublet" = "Doublet",
"Negative" = "Negative")
Compute PCAs and tSNEs
DefaultAssay(seu1) <- "HTO"
seu1 <- NormalizeData(seu1, assay = "HTO", normalization.method = "CLR")
Normalizing across features
seu1 <- ScaleData(seu1, features = rownames(seu1),
verbose = FALSE)
seu1 <- RunPCA(seu1, features = rownames(seu1), approx = FALSE, verbose = FALSE)
seu1 <- RunTSNE(seu1, dims = 1:8, perplexity = 100, check_duplicates = FALSE, verbose = FALSE)
DefaultAssay(seu2) <- "HTO"
seu2 <- NormalizeData(seu2, assay = "HTO", normalization.method = "CLR")
Normalizing across features
seu2 <- ScaleData(seu2, features = rownames(seu2),
verbose = FALSE)
seu2 <- RunPCA(seu2, features = rownames(seu2), approx = FALSE, verbose = FALSE)
seu2 <- RunTSNE(seu2, dims = 1:8, perplexity = 100, check_duplicates = FALSE, verbose = FALSE)
DefaultAssay(seu3) <- "HTO"
seu3 <- NormalizeData(seu3, assay = "HTO", normalization.method = "CLR")
Normalizing across features
seu3 <- ScaleData(seu3, features = rownames(seu3),
verbose = FALSE)
seu3 <- RunPCA(seu3, features = rownames(seu3), approx = FALSE, verbose = FALSE)
seu3 <- RunTSNE(seu3, dims = 1:8, perplexity = 100, check_duplicates = FALSE, verbose = FALSE)
Same for individual captures
DefaultAssay(seu1_c1) <- "HTO"
seu1_c1 <- NormalizeData(seu1_c1, assay = "HTO", normalization.method = "CLR")
Normalizing across features
seu1_c1 <- ScaleData(seu1_c1, features = rownames(seu1_c1),
verbose = FALSE)
seu1_c1 <- RunPCA(seu1_c1, features = rownames(seu1_c1), approx = FALSE, verbose = FALSE)
seu1_c1 <- RunTSNE(seu1_c1, dims = 1:8, perplexity = 100, check_duplicates = FALSE, verbose = FALSE)
DefaultAssay(seu1_c2) <- "HTO"
seu1_c2 <- NormalizeData(seu1_c2, assay = "HTO", normalization.method = "CLR")
Normalizing across features
seu1_c2 <- ScaleData(seu1_c2, features = rownames(seu1_c2),
verbose = FALSE)
seu1_c2 <- RunPCA(seu1_c2, features = rownames(seu1_c2), approx = FALSE, verbose = FALSE)
seu1_c2 <- RunTSNE(seu1_c2, dims = 1:8, perplexity = 100, check_duplicates = FALSE, verbose = FALSE)
seu2_c1 <- NormalizeData(seu2_c1, assay = "HTO", normalization.method = "CLR")
Normalizing across features
seu2_c1 <- ScaleData(seu2_c1, features = rownames(seu2_c1),
verbose = FALSE)
seu2_c1 <- RunPCA(seu2_c1, features = rownames(seu2_c1), approx = FALSE, verbose = FALSE)
seu2_c1 <- RunTSNE(seu2_c1, dims = 1:8, perplexity = 100, check_duplicates = FALSE, verbose = FALSE)
DefaultAssay(seu2_c2) <- "HTO"
seu2_c2 <- NormalizeData(seu2_c2, assay = "HTO", normalization.method = "CLR")
Normalizing across features
seu2_c2 <- ScaleData(seu2_c2, features = rownames(seu2_c2),
verbose = FALSE)
seu2_c2 <- RunPCA(seu2_c2, features = rownames(seu2_c2), approx = FALSE, verbose = FALSE)
seu2_c2 <- RunTSNE(seu2_c2, dims = 1:8, perplexity = 100, check_duplicates = FALSE, verbose = FALSE)
seu3_c1 <- NormalizeData(seu3_c1, assay = "HTO", normalization.method = "CLR")
Normalizing across features
seu3_c1 <- ScaleData(seu3_c1, features = rownames(seu3_c1),
verbose = FALSE)
seu3_c1 <- RunPCA(seu3_c1, features = rownames(seu3_c1), approx = FALSE, verbose = FALSE)
seu3_c1 <- RunTSNE(seu3_c1, dims = 1:8, perplexity = 100, check_duplicates = FALSE, verbose = FALSE)
DefaultAssay(seu3_c2) <- "HTO"
seu3_c2 <- NormalizeData(seu3_c2, assay = "HTO", normalization.method = "CLR")
Normalizing across features
seu3_c2 <- ScaleData(seu3_c2, features = rownames(seu3_c2),
verbose = FALSE)
seu3_c2 <- RunPCA(seu3_c2, features = rownames(seu3_c2), approx = FALSE, verbose = FALSE)
seu3_c2 <- RunTSNE(seu3_c2, dims = 1:8, perplexity = 100, check_duplicates = FALSE, verbose = FALSE)
Density plots per barcode. In ideal conditions the density of the hashtag counts should appear bimodal, with a lower peak corresponding to the background and the higher peak corresponding to the signal.
df1 <- as.data.frame(t(seu1[["HTO"]]@counts))
df1 %>%
pivot_longer(cols = starts_with("BAL")) %>%
mutate(logged = log(value + 1)) %>%
ggplot(aes(x = logged)) +
xlab("log(counts)") +
xlim(0.1,8) +
ggtitle("Batch 1") +
geom_density(adjust = 5) +
facet_wrap(~name, scales = "fixed", ncol = 4) +
theme(axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks = element_blank()) -> p1
df2 <- as.data.frame(t(seu2[["HTO"]]@counts))
df2 %>%
pivot_longer(cols = starts_with("BAL")) %>%
mutate(logged = log(value + 1)) %>%
ggplot(aes(x = logged)) +
xlab("log(counts)") +
xlim(0.1, 8) +
ggtitle("Batch 2") +
geom_density(adjust = 5) +
facet_wrap(~name, scales = "fixed", ncol = 4) +
theme(axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()) -> p2
df3 <- as.data.frame(t(seu3[["HTO"]]@counts))
df3 %>%
pivot_longer(cols = starts_with("BAL")) %>%
mutate(logged = log(value + 1)) %>%
ggplot(aes(x = logged)) +
xlab("log(counts)") +
xlim(0.1,8) +
ggtitle("Batch 3") +
geom_density(adjust = 5) +
facet_wrap(~name, scales = "fixed", ncol = 4) +
theme(axis.ticks = element_blank()) -> p3
(p1 / p2 / p3) + plot_annotation(tag_levels = 'a')
Warning: Removed 2242 rows containing non-finite values (`stat_density()`).
Warning: Removed 1241 rows containing non-finite values (`stat_density()`).
Warning: Removed 4490 rows containing non-finite values (`stat_density()`).
tSNE plots
p4 <- DimPlot(seu1, group.by = "genetic_donor") +
ggtitle("Batch 1") +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank(),
axis.text.y = element_blank(), axis.ticks.y = element_blank(),
axis.line.x = element_blank(), axis.line.y = element_blank(),
axis.title.x = element_blank(), axis.title.y = element_blank(),
legend.text=element_text(size = 5),
plot.margin = margin(0,0,0,0,"cm"))
p5 <- DimPlot(seu2, group.by = "genetic_donor") +
ggtitle("Batch 2") +
xlab("") +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank(),
axis.text.y = element_blank(), axis.ticks.y = element_blank(),
axis.line.x = element_blank(), axis.line.y = element_blank(),
axis.title.x = element_blank(),
legend.text=element_text(size=5),
plot.margin = margin(0,0,0,0,"cm"))
p6 <- DimPlot(seu3, group.by = "genetic_donor") +
ggtitle("Batch 3") +
ylab("") +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank(),
axis.text.y = element_blank(), axis.ticks.y = element_blank(),
axis.line.x = element_blank(), axis.line.y = element_blank(),
axis.title.y = element_blank(),
legend.text=element_text(size = 5),
plot.margin = margin(0,0,0,0,"cm"))
p4 / p5 / p6
(((p1 / p2 / p3) | (p4 / p5 / p6)) + plot_annotation(tag_levels = 'a')) &
theme(plot.title = element_text(face = "plain", size = 10),
plot.tag = element_text(face = 'plain'))
Warning: Removed 2242 rows containing non-finite values (`stat_density()`).
Warning: Removed 1241 rows containing non-finite values (`stat_density()`).
Warning: Removed 4490 rows containing non-finite values (`stat_density()`).
#ggsave(here("paper_latex", "figures", "QC_plots_new.png"),
# plot = (((p1 / p2 / p3) | (p4 / p5 / p6)) + plot_annotation(tag_levels = 'a')) &
# theme(plot.title = element_text(face = "plain", size = 10),
# plot.tag = element_text(face = 'plain')),
# device = "png",
#path = here("paper_latex"figures"),
# width = 10, height = 9,
# units = "in",
# dpi = 300)
Checking if there’s any noticeable difference between the combined results and the capture-separated results.
p4 <- DimPlot(seu1_c1, group.by = "genetic_donor") +
ggtitle("Batch 1 (capture 1)") +
NoLegend() +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank(),
axis.text.y = element_blank(), axis.ticks.y = element_blank(),
axis.line.x = element_blank(), axis.line.y = element_blank(),
axis.title.x = element_blank(), axis.title.y = element_blank(),
plot.margin = margin(0,0,0,0,"cm"))
p5 <- DimPlot(seu2_c1, group.by = "genetic_donor") +
ggtitle("Batch 2 (capture 1)") +
xlab("") +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank(),
axis.text.y = element_blank(), axis.ticks.y = element_blank(),
axis.line.x = element_blank(), axis.line.y = element_blank(),
axis.title.x = element_blank(),
plot.margin = margin(0,0,0,0,"cm"))
p6 <- DimPlot(seu3_c1, group.by = "genetic_donor") +
ggtitle("Batch 3 (capture 1)") +
NoLegend() +
ylab("") +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank(),
axis.text.y = element_blank(), axis.ticks.y = element_blank(),
axis.line.x = element_blank(), axis.line.y = element_blank(),
axis.title.y = element_blank(),
plot.margin = margin(0,0,0,0,"cm"))
p7 <- DimPlot(seu1_c2, group.by = "genetic_donor") +
ggtitle("Batch 1 (capture 2)") +
NoLegend() +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank(),
axis.text.y = element_blank(), axis.ticks.y = element_blank(),
axis.line.x = element_blank(), axis.line.y = element_blank(),
axis.title.x = element_blank(), axis.title.y = element_blank(),
plot.margin = margin(0,0,0,0,"cm"))
p8 <- DimPlot(seu2_c2, group.by = "genetic_donor") +
ggtitle("Batch 2 (capture 2)") +
xlab("") +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank(),
axis.text.y = element_blank(), axis.ticks.y = element_blank(),
axis.line.x = element_blank(), axis.line.y = element_blank(),
axis.title.x = element_blank(),
plot.margin = margin(0,0,0,0,"cm"))
p9 <- DimPlot(seu3_c2, group.by = "genetic_donor") +
ggtitle("Batch 3 (capture 2)") +
NoLegend() +
ylab("") +
theme(axis.text.x = element_blank(), axis.ticks.x = element_blank(),
axis.text.y = element_blank(), axis.ticks.y = element_blank(),
axis.line.x = element_blank(), axis.line.y = element_blank(),
axis.title.y = element_blank(),
plot.margin = margin(0,0,0,0,"cm"))
(p1 / p2 / p3) | (p4 / p5 / p6) | (p7 / p8 / p9)
Warning: Removed 2242 rows containing non-finite values (`stat_density()`).
Warning: Removed 1241 rows containing non-finite values (`stat_density()`).
Warning: Removed 4490 rows containing non-finite values (`stat_density()`).
This is where we run each of the demultiplexing methods on each capture of each batch.
This function creates a list of hashedDrops calls. Its defaults are the same as hashedDrops
create_hashedDrops_factor <- function(seurat_object, confident.min = 2,
doublet.nmads = 3, doublet.min = 2) {
hto_counts <- GetAssayData(seurat_object[["HTO"]], slot = "counts")
hash_stats <- DropletUtils::hashedDrops(hto_counts, confident.min = confident.min,
doublet.nmads = doublet.nmads, doublet.min = doublet.min)
hash_stats$Best <- rownames(seurat_object[["HTO"]])[hash_stats$Best]
hash_stats$Second <- rownames(seurat_object[["HTO"]])[hash_stats$Second]
HTO_assignments <- factor(case_when(
hash_stats$Confident == TRUE ~ hash_stats$Best,
hash_stats$Doublet == TRUE ~ "Doublet",
TRUE ~ "Negative"))
return(HTO_assignments)
}
Making factors with best parameters
seu1_c1$hashedDrops_calls <- create_hashedDrops_factor(seu1_c1, confident.min = 0.5, doublet.nmads = 3, doublet.min = 2)
seu1_c2$hashedDrops_calls <- create_hashedDrops_factor(seu1_c2, confident.min = 0.5, doublet.nmads = 3, doublet.min = 2)
seu2_c1$hashedDrops_calls <- create_hashedDrops_factor(seu2_c1, confident.min = 0.5, doublet.nmads = 3, doublet.min = 2)
seu2_c2$hashedDrops_calls <- create_hashedDrops_factor(seu2_c2, confident.min = 0.5, doublet.nmads = 3, doublet.min = 2)
seu3_c1$hashedDrops_calls <- create_hashedDrops_factor(seu3_c1, confident.min = 0.5, doublet.nmads = 3, doublet.min = 2)
seu3_c2$hashedDrops_calls <- create_hashedDrops_factor(seu3_c2, confident.min = 0.5, doublet.nmads = 3, doublet.min = 2)
Now with default parameters
seu1_c1$hashedDrops_default_calls <- create_hashedDrops_factor(seu1_c1)
seu1_c2$hashedDrops_default_calls <- create_hashedDrops_factor(seu1_c2)
seu2_c1$hashedDrops_default_calls <- create_hashedDrops_factor(seu2_c1)
seu2_c2$hashedDrops_default_calls <- create_hashedDrops_factor(seu2_c2)
seu3_c1$hashedDrops_default_calls <- create_hashedDrops_factor(seu3_c1)
seu3_c2$hashedDrops_default_calls <- create_hashedDrops_factor(seu3_c2)
HashSolo is a scanpy program. Needs a bit of prep
Write to anndata compatible files Counts
library(Matrix)
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
The following object is masked from 'package:S4Vectors':
expand
writeMM(seu1_c1@assays$HTO@counts, here("data", "BAL_data", "adata", "batch1_c1_counts.mtx"))
NULL
writeMM(seu1_c2@assays$HTO@counts, here("data", "BAL_data", "adata", "batch1_c2_counts.mtx"))
NULL
writeMM(seu2_c1@assays$HTO@counts, here("data", "BAL_data", "adata", "batch2_c1_counts.mtx"))
NULL
writeMM(seu2_c2@assays$HTO@counts, here("data", "BAL_data", "adata", "batch2_c2_counts.mtx"))
NULL
writeMM(seu3_c1@assays$HTO@counts, here("data", "BAL_data", "adata", "batch3_c1_counts.mtx"))
NULL
writeMM(seu3_c2@assays$HTO@counts, here("data", "BAL_data", "adata", "batch3_c2_counts.mtx"))
NULL
Barcodes
barcodes <- data.frame(colnames(seu1_c1))
colnames(barcodes)<-'Barcode'
write.csv(barcodes, here("data", "BAL_data", "adata", "batch1_c1_barcodes.csv"),
quote = FALSE,row.names = FALSE)
barcodes <- data.frame(colnames(seu1_c2))
colnames(barcodes)<-'Barcode'
write.csv(barcodes, here("data", "BAL_data", "adata", "batch1_c2_barcodes.csv"),
quote = FALSE,row.names = FALSE)
barcodes <- data.frame(colnames(seu2_c1))
colnames(barcodes)<-'Barcode'
write.csv(barcodes, here("data", "BAL_data", "adata", "batch2_c1_barcodes.csv"),
quote = FALSE,row.names = FALSE)
barcodes <- data.frame(colnames(seu2_c2))
colnames(barcodes)<-'Barcode'
write.csv(barcodes, here("data", "BAL_data", "adata", "batch2_c2_barcodes.csv"),
quote = FALSE,row.names = FALSE)
barcodes <- data.frame(colnames(seu3_c1))
colnames(barcodes)<-'Barcode'
write.csv(barcodes, here("data", "BAL_data", "adata", "batch3_c1_barcodes.csv"),
quote = FALSE,row.names = FALSE)
barcodes <- data.frame(colnames(seu3_c2))
colnames(barcodes)<-'Barcode'
write.csv(barcodes, here("data", "BAL_data", "adata", "batch3_c2_barcodes.csv"),
quote = FALSE,row.names = FALSE)
Save HTO names (just need one capture per batch)
HTOs <- data.frame(rownames(seu1_c1))
colnames(HTOs) <- 'HTO'
write.csv(HTOs, here("data", "BAL_data", "adata", "batch1_HTOs.csv"),
quote = FALSE,row.names = FALSE)
HTOs <- data.frame(rownames(seu2_c1))
colnames(HTOs) <- 'HTO'
write.csv(HTOs, here("data", "BAL_data", "adata", "batch2_HTOs.csv"),
quote = FALSE,row.names = FALSE)
HTOs <- data.frame(rownames(seu3_c1))
colnames(HTOs) <- 'HTO'
write.csv(HTOs, here("data", "BAL_data", "adata", "batch3_HTOs.csv"),
quote = FALSE,row.names = FALSE)
See hashsolo_calls.ipynb for how we get these assignments
seu1_c1$hashsolo_calls <- read.csv("~/singlecell/hashtag-demux-paper/data/BAL_data/batch1_c1_hashsolo.csv")$Classification
seu1_c2$hashsolo_calls <- read.csv("~/singlecell/hashtag-demux-paper/data/BAL_data/batch1_c2_hashsolo.csv")$Classification
seu2_c1$hashsolo_calls <- read.csv("~/singlecell/hashtag-demux-paper/data/BAL_data/batch2_c1_hashsolo.csv")$Classification
seu2_c2$hashsolo_calls <- read.csv("~/singlecell/hashtag-demux-paper/data/BAL_data/batch2_c2_hashsolo.csv")$Classification
seu3_c1$hashsolo_calls <- read.csv("~/singlecell/hashtag-demux-paper/data/BAL_data/batch3_c1_hashsolo.csv")$Classification
seu3_c2$hashsolo_calls <- read.csv("~/singlecell/hashtag-demux-paper/data/BAL_data/batch3_c2_hashsolo.csv")$Classification
HDmux <- HTODemux(seu1_c1)
Cutoff for BAL 01 : 74 reads
Cutoff for BAL 02 : 103 reads
Cutoff for BAL 03 : 80 reads
Cutoff for BAL 04 : 30 reads
Cutoff for BAL 05 : 52 reads
Cutoff for BAL 06 : 23 reads
Cutoff for BAL 07 : 36 reads
Cutoff for BAL 08 : 57 reads
seu1_c1$HTODemux_calls <- HDmux$hash.ID
HDmux <- HTODemux(seu1_c2)
Cutoff for BAL 01 : 93 reads
Cutoff for BAL 02 : 164 reads
Cutoff for BAL 03 : 75 reads
Cutoff for BAL 04 : 32 reads
Cutoff for BAL 05 : 71 reads
Cutoff for BAL 06 : 38 reads
Cutoff for BAL 07 : 40 reads
Cutoff for BAL 08 : 48 reads
seu1_c2$HTODemux_calls <- HDmux$hash.ID
HDmux <- HTODemux(seu2_c1)
Cutoff for BAL 09 : 50 reads
Cutoff for BAL 10 : 29 reads
Cutoff for BAL 11 : 25 reads
Cutoff for BAL 12 : 94 reads
Cutoff for BAL 13 : 25 reads
Cutoff for BAL 14 : 16 reads
Cutoff for BAL 15 : 131 reads
Cutoff for BAL 16 : 77 reads
seu2_c1$HTODemux_calls <- HDmux$hash.ID
HDmux <- HTODemux(seu2_c2)
Cutoff for BAL 09 : 64 reads
Cutoff for BAL 10 : 20 reads
Cutoff for BAL 11 : 23 reads
Cutoff for BAL 12 : 107 reads
Cutoff for BAL 13 : 27 reads
Cutoff for BAL 14 : 22 reads
Cutoff for BAL 15 : 127 reads
Cutoff for BAL 16 : 100 reads
seu2_c2$HTODemux_calls <- HDmux$hash.ID
HDmux <- HTODemux(seu3_c1)
Cutoff for BAL 17 : 14 reads
Cutoff for BAL 18 : 22 reads
Cutoff for BAL 19 : 11 reads
Cutoff for BAL 20 : 51 reads
Cutoff for BAL 21 : 13 reads
Cutoff for BAL 22 : 25 reads
Cutoff for BAL 23 : 46 reads
Cutoff for BAL 24 : 73 reads
seu3_c1$HTODemux_calls <- HDmux$hash.ID
HDmux <- HTODemux(seu3_c2)
Cutoff for BAL 17 : 18 reads
Cutoff for BAL 18 : 33 reads
Cutoff for BAL 19 : 21 reads
Cutoff for BAL 20 : 68 reads
Cutoff for BAL 21 : 19 reads
Cutoff for BAL 22 : 18 reads
Cutoff for BAL 23 : 85 reads
Cutoff for BAL 24 : 80 reads
seu3_c2$HTODemux_calls <- HDmux$hash.ID
Need to write transpose of counts matrices to .csv files to run GMM-Demux on command line.
write.csv(t(as.matrix(batch1_c1_counts)), here("data", "BAL_data", "GMM-Demux", "batch1_c1_hto_counts_transpose.csv"))
write.csv(t(as.matrix(batch1_c2_counts)), here("data", "BAL_data", "GMM-Demux", "batch1_c2_hto_counts_transpose.csv"))
write.csv(t(as.matrix(batch2_c1_counts)), here("data", "BAL_data", "GMM-Demux", "batch2_c1_hto_counts_transpose.csv"))
write.csv(t(as.matrix(batch2_c2_counts)), here("data", "BAL_data", "GMM-Demux", "batch2_c2_hto_counts_transpose.csv"))
write.csv(t(as.matrix(batch3_c1_counts)), here("data", "BAL_data", "GMM-Demux", "batch3_c1_hto_counts_transpose.csv"))
write.csv(t(as.matrix(batch3_c2_counts)), here("data", "BAL_data", "GMM-Demux", "batch3_c2_hto_counts_transpose.csv"))
See run_GMM_demux_BAL.sh script for running GMM-Demux
This function takes the output of GMM demux and makes it consistent with the rest of the methods considered here.
create_gmm_demux_factor <- function(seu, GMM_path, hto_list) {
#Read in output, have to use the "full" report, not the simplified one.
calls <- read.csv(paste0(GMM_path, "/GMM_full.csv"), row.names = 1)
#Read in names of clusters
cluster_names <- read.table(paste0(GMM_path, "/GMM_full.config"), sep = ",")
names(cluster_names) <- c("Cluster_id", "assignment")
#Need to fix the formatting of the assignment names, for some reason there's a leading space.
cluster_names$assignment <- gsub(x = cluster_names$assignment, pattern = '^ ', replacement = '')
#Add cell barcodes
calls$Barcode <- rownames(calls)
calls <- merge(calls, cluster_names, by = "Cluster_id", sort = FALSE)
#Need to re-order after merge for some reason
calls <- calls[order(match(calls$Barcode, names(seu$Barcode))), ]
#Rename the negative cluster for consistency
calls$assignment[calls$assignment == "negative"] <- "Negative"
#Put all the multiplet states into one assignment category
calls$assignment[!calls$assignment %in% c("Negative", hto_list)] <- "Doublet"
return(as.factor(calls$assignment))
}
Add to objects
#Batch 1
seu1_c1$GMMDemux_calls <- create_gmm_demux_factor(seu1_c1, here("data", "BAL_data", "GMM-Demux", "gmm_out_batch1_c1", "full_report"), HTO_list_batch1)
seu1_c2$GMMDemux_calls <- create_gmm_demux_factor(seu1_c2, here("data", "BAL_data", "GMM-Demux", "gmm_out_batch1_c2", "full_report"), HTO_list_batch1)
#Batch 2
seu2_c1$GMMDemux_calls <- create_gmm_demux_factor(seu2_c1, here("data", "BAL_data", "GMM-Demux", "gmm_out_batch2_c1", "full_report"), HTO_list_batch2)
seu2_c2$GMMDemux_calls <- create_gmm_demux_factor(seu2_c2, here("data", "BAL_data", "GMM-Demux", "gmm_out_batch2_c2", "full_report"), HTO_list_batch2)
#Batch 3
seu3_c1$GMMDemux_calls <- create_gmm_demux_factor(seu3_c1, here("data", "BAL_data", "GMM-Demux", "gmm_out_batch3_c1", "full_report"), HTO_list_batch3)
seu3_c2$GMMDemux_calls <- create_gmm_demux_factor(seu3_c2, here("data", "BAL_data", "GMM-Demux", "gmm_out_batch3_c2", "full_report"), HTO_list_batch3)
Next is deMULTIplex, using the Seurat wrapper function MULTIseqDemux for this
seu1_c1$deMULTIplex_calls <- MULTIseqDemux(seu1_c1, autoThresh = TRUE)$MULTI_ID
Iteration 1
Using quantile 0.35
Iteration 2
Using quantile 0.35
Iteration 3
Using quantile 0.35
seu1_c2$deMULTIplex_calls <- MULTIseqDemux(seu1_c2, autoThresh = TRUE)$MULTI_ID
Iteration 1
Using quantile 0.35
Iteration 2
Using quantile 0.35
Iteration 3
Using quantile 0.35
Iteration 4
Using quantile 0.35
seu2_c1$deMULTIplex_calls <- MULTIseqDemux(seu2_c1, autoThresh = TRUE)$MULTI_ID
Iteration 1
Using quantile 0.35
Iteration 2
Using quantile 0.35
Iteration 3
Using quantile 0.35
seu2_c2$deMULTIplex_calls <- MULTIseqDemux(seu2_c2, autoThresh = TRUE)$MULTI_ID
Iteration 1
Using quantile 0.3
Iteration 2
Using quantile 0.3
Iteration 3
Using quantile 0.3
seu3_c1$deMULTIplex_calls <- MULTIseqDemux(seu3_c1, autoThresh = TRUE)$MULTI_ID
Iteration 1
Using quantile 0.45
No threshold found for BAL 19...
No threshold found for BAL 23...
No threshold found for BAL 19...
No threshold found for BAL 23...
No threshold found for BAL 19...
No threshold found for BAL 23...
No threshold found for BAL 19...
No threshold found for BAL 23...
No threshold found for BAL 19...
No threshold found for BAL 23...
No threshold found for BAL 19...
No threshold found for BAL 23...
No threshold found for BAL 19...
No threshold found for BAL 23...
No threshold found for BAL 19...
No threshold found for BAL 23...
No threshold found for BAL 19...
No threshold found for BAL 23...
No threshold found for BAL 19...
No threshold found for BAL 23...
No threshold found for BAL 19...
No threshold found for BAL 23...
No threshold found for BAL 19...
No threshold found for BAL 23...
No threshold found for BAL 19...
No threshold found for BAL 23...
No threshold found for BAL 19...
No threshold found for BAL 23...
No threshold found for BAL 19...
No threshold found for BAL 23...
No threshold found for BAL 19...
No threshold found for BAL 23...
No threshold found for BAL 19...
No threshold found for BAL 23...
Iteration 2
Using quantile 0.45
No threshold found for BAL 19...
No threshold found for BAL 23...
No threshold found for BAL 22...
No threshold found for BAL 22...
No threshold found for BAL 22...
No threshold found for BAL 22...
No threshold found for BAL 22...
No threshold found for BAL 22...
No threshold found for BAL 22...
No threshold found for BAL 22...
No threshold found for BAL 22...
No threshold found for BAL 22...
No threshold found for BAL 22...
No threshold found for BAL 22...
No threshold found for BAL 22...
No threshold found for BAL 22...
No threshold found for BAL 22...
No threshold found for BAL 22...
No threshold found for BAL 22...
Iteration 3
Using quantile 0.45
No threshold found for BAL 22...
Iteration 4
Using quantile 0.55
Iteration 5
Using quantile 0.55
seu3_c2$deMULTIplex_calls <- MULTIseqDemux(seu3_c2, autoThresh = TRUE)$MULTI_ID
Iteration 1
Using quantile 0.35
Iteration 2
Using quantile 0.35
Iteration 3
Using quantile 0.35
CellhashR’s BFF_raw and BFF_cluster methods. Need to run this on the raw counts matrix
cellhashR_calls <- GenerateCellHashingCalls(barcodeMatrix = batch1_c1_counts, methods = c("bff_raw", "bff_cluster"), doTSNE = FALSE, doHeatmap = FALSE)
[1] "Converting input data.frame to a matrix"
[1] "Starting BFF"
[1] "rows dropped for low counts: 0 of 8"
[1] "Running BFF_raw"
[1] "Thresholds:"
[1] "BAL 08: 88.5286373638308"
[1] "BAL 07: 79.0940963329433"
[1] "BAL 06: 59.3081555250612"
[1] "BAL 05: 118.254083321895"
[1] "BAL 04: 40.1613960189724"
[1] "BAL 03: 94.3265319348506"
[1] "BAL 02: 888.584445369128"
[1] "BAL 01: 103.138027031719"
[1] "Starting BFF"
[1] "rows dropped for low counts: 0 of 8"
[1] "Running BFF_cluster"
[1] "Doublet threshold: 0.05"
[1] "Neg threshold: 0.05"
[1] "Min distance as fraction of distance between peaks: 0.1"
[1] "Thresholds:"
[1] "BAL 08: 88.5286373638308"
[1] "BAL 07: 79.0940963329433"
[1] "BAL 06: 59.3081555250612"
[1] "BAL 05: 118.254083321895"
[1] "BAL 04: 40.1613960189724"
[1] "BAL 03: 94.3265319348506"
[1] "BAL 02: 888.584445369128"
[1] "BAL 01: 103.138027031719"
[1] "Generating consensus calls"
[1] "Consensus calls will be generated using: bff_raw,bff_cluster"
[1] "Total concordant: 10754"
[1] "Total discordant: 825 (7.12%)"
seu1_c1$BFF_raw_calls <- cellhashR_calls$bff_raw
seu1_c1$BFF_cluster_calls <- cellhashR_calls$bff_cluster
cellhashR_calls <- GenerateCellHashingCalls(barcodeMatrix = batch1_c2_counts, methods = c("bff_raw", "bff_cluster"), doTSNE = FALSE, doHeatmap = FALSE)
[1] "Converting input data.frame to a matrix"
[1] "Starting BFF"
[1] "rows dropped for low counts: 0 of 8"
[1] "Running BFF_raw"
[1] "Thresholds:"
[1] "BAL 08: 153.703282800423"
[1] "BAL 07: 133.643246005432"
[1] "BAL 06: 96.3579502744319"
[1] "BAL 05: 173.236542609504"
[1] "BAL 04: 100.863612367615"
[1] "BAL 03: 124.030608336188"
[1] "BAL 02: 227.676246125818"
[1] "BAL 01: 269.704047931178"
[1] "Starting BFF"
[1] "rows dropped for low counts: 0 of 8"
[1] "Running BFF_cluster"
[1] "Doublet threshold: 0.05"
[1] "Neg threshold: 0.05"
[1] "Min distance as fraction of distance between peaks: 0.1"
[1] "Thresholds:"
[1] "BAL 08: 153.703282800423"
[1] "BAL 07: 133.643246005432"
[1] "BAL 06: 96.3579502744319"
[1] "BAL 05: 173.236542609504"
[1] "BAL 04: 100.863612367615"
[1] "BAL 03: 124.030608336188"
[1] "BAL 02: 227.676246125818"
[1] "BAL 01: 269.704047931178"
[1] "Generating consensus calls"
[1] "Consensus calls will be generated using: bff_raw,bff_cluster"
[1] "Total concordant: 11221"
[1] "Total discordant: 1291 (10.32%)"
seu1_c2$BFF_raw_calls <- cellhashR_calls$bff_raw
seu1_c2$BFF_cluster_calls <- cellhashR_calls$bff_cluster
cellhashR_calls <- GenerateCellHashingCalls(barcodeMatrix = batch2_c1_counts, methods = c("bff_raw", "bff_cluster"), doTSNE = FALSE, doHeatmap = FALSE)
[1] "Converting input data.frame to a matrix"
[1] "Starting BFF"
[1] "rows dropped for low counts: 0 of 8"
[1] "Running BFF_raw"
[1] "Only one peak found, using max value as cutoff: BAL 10"
[1] "Only one peak found, using max value as cutoff: BAL 13"
[1] "Only one peak found, using max value as cutoff: BAL 14"
[1] "Thresholds:"
[1] "BAL 16: 116.767395768808"
[1] "BAL 15: 92.8420586232944"
[1] "BAL 14: 174971.187265843"
[1] "BAL 13: 34516.284451742"
[1] "BAL 12: 100.081954141623"
[1] "BAL 11: 42.8472794350699"
[1] "BAL 10: 3964.00460172807"
[1] "BAL 09: 141.060749374794"
[1] "Starting BFF"
[1] "rows dropped for low counts: 0 of 8"
[1] "Running BFF_cluster"
[1] "Doublet threshold: 0.05"
[1] "Neg threshold: 0.05"
[1] "Min distance as fraction of distance between peaks: 0.1"
[1] "Only one peak found, using max value as cutoff: BAL 10"
[1] "Only one peak found, using max value as cutoff: BAL 13"
[1] "Only one peak found, using max value as cutoff: BAL 14"
[1] "Thresholds:"
[1] "BAL 16: 116.767395768808"
[1] "BAL 15: 92.8420586232944"
[1] "BAL 14: 174971.187265843"
[1] "BAL 13: 34516.284451742"
[1] "BAL 12: 100.081954141623"
[1] "BAL 11: 42.8472794350699"
[1] "BAL 10: 3964.00460172807"
[1] "BAL 09: 141.060749374794"
[1] "Only one peak found, using max value as cutoff: BAL 10"
[1] "Only one peak found, using max value as cutoff: BAL 13"
[1] "Only one peak found, using max value as cutoff: BAL 14"
[1] "Smoothing parameter j = 5"
[1] "Smoothing parameter j = 10"
[1] "Smoothing parameter j = 15"
[1] "Smoothing parameter j = 20"
[1] "Smoothing parameter j = 25"
[1] "Generating consensus calls"
[1] "Consensus calls will be generated using: bff_raw,bff_cluster"
[1] "Total concordant: 9521"
[1] "Total discordant: 14514 (60.39%)"
seu2_c1$BFF_raw_calls <- cellhashR_calls$bff_raw
seu2_c1$BFF_cluster_calls <- cellhashR_calls$bff_cluster
cellhashR_calls <- GenerateCellHashingCalls(barcodeMatrix = batch2_c2_counts, methods = c("bff_raw", "bff_cluster"), doTSNE = FALSE, doHeatmap = FALSE)
[1] "Converting input data.frame to a matrix"
[1] "Starting BFF"
[1] "rows dropped for low counts: 0 of 8"
[1] "Running BFF_raw"
[1] "Only one peak found, using max value as cutoff: BAL 10"
[1] "Only one peak found, using max value as cutoff: BAL 14"
[1] "Thresholds:"
[1] "BAL 16: 144.087607973687"
[1] "BAL 15: 99.1574648213469"
[1] "BAL 14: 540748.32248393"
[1] "BAL 13: 112.760207720256"
[1] "BAL 12: 121.974958078504"
[1] "BAL 11: 44.1513268472175"
[1] "BAL 10: 9749.73495120045"
[1] "BAL 09: 115.609426366988"
[1] "Starting BFF"
[1] "rows dropped for low counts: 0 of 8"
[1] "Running BFF_cluster"
[1] "Doublet threshold: 0.05"
[1] "Neg threshold: 0.05"
[1] "Min distance as fraction of distance between peaks: 0.1"
[1] "Only one peak found, using max value as cutoff: BAL 10"
[1] "Only one peak found, using max value as cutoff: BAL 14"
[1] "Thresholds:"
[1] "BAL 16: 144.087607973687"
[1] "BAL 15: 99.1574648213469"
[1] "BAL 14: 540748.32248393"
[1] "BAL 13: 112.760207720256"
[1] "BAL 12: 121.974958078504"
[1] "BAL 11: 44.1513268472175"
[1] "BAL 10: 9749.73495120045"
[1] "BAL 09: 115.609426366988"
[1] "Only one peak found, using max value as cutoff: BAL 10"
[1] "Only one peak found, using max value as cutoff: BAL 14"
[1] "Smoothing parameter j = 5"
[1] "Smoothing parameter j = 10"
[1] "Smoothing parameter j = 15"
[1] "Smoothing parameter j = 20"
[1] "Smoothing parameter j = 25"
[1] "Smoothing parameter j = 30"
[1] "Smoothing parameter j = 35"
[1] "Generating consensus calls"
[1] "Consensus calls will be generated using: bff_raw,bff_cluster"
[1] "Total concordant: 9488"
[1] "Total discordant: 15318 (61.75%)"
seu2_c2$BFF_raw_calls <- cellhashR_calls$bff_raw
seu2_c2$BFF_cluster_calls <- cellhashR_calls$bff_cluster
cellhashR_calls <- GenerateCellHashingCalls(barcodeMatrix = batch3_c1_counts, methods = c("bff_raw", "bff_cluster"), doTSNE = FALSE, doHeatmap = FALSE)
[1] "Converting input data.frame to a matrix"
[1] "Starting BFF"
[1] "rows dropped for low counts: 2 of 8"
[1] "Running BFF_raw"
[1] "Only one peak found, using max value as cutoff: BAL 17"
[1] "Only one peak found, using max value as cutoff: BAL 23"
[1] "Thresholds:"
[1] "BAL 24: 71.9803810612437"
[1] "BAL 23: 52018.8671908447"
[1] "BAL 20: 60.4478104537598"
[1] "BAL 19: 23.2836098685648"
[1] "BAL 18: 63.4207352591075"
[1] "BAL 17: 44498.4901677433"
[1] "Starting BFF"
[1] "rows dropped for low counts: 2 of 8"
[1] "Running BFF_cluster"
[1] "Doublet threshold: 0.05"
[1] "Neg threshold: 0.05"
[1] "Min distance as fraction of distance between peaks: 0.1"
[1] "Only one peak found, using max value as cutoff: BAL 17"
[1] "Only one peak found, using max value as cutoff: BAL 23"
[1] "Thresholds:"
[1] "BAL 24: 71.9803810612437"
[1] "BAL 23: 52018.8671908447"
[1] "BAL 20: 60.4478104537598"
[1] "BAL 19: 23.2836098685648"
[1] "BAL 18: 63.4207352591075"
[1] "BAL 17: 44498.4901677433"
[1] "Only one peak found, using max value as cutoff: BAL 23"
[1] "Smoothing parameter j = 5"
[1] "Smoothing parameter j = 10"
[1] "Smoothing parameter j = 15"
[1] "Smoothing parameter j = 20"
[1] "Smoothing parameter j = 25"
[1] "Generating consensus calls"
[1] "Consensus calls will be generated using: bff_raw,bff_cluster"
[1] "Total concordant: 24809"
[1] "Total discordant: 6422 (20.56%)"
seu3_c1$BFF_raw_calls <- cellhashR_calls$bff_raw
seu3_c1$BFF_cluster_calls <- cellhashR_calls$bff_cluster
cellhashR_calls <- GenerateCellHashingCalls(barcodeMatrix = batch3_c2_counts, methods = c("bff_raw", "bff_cluster"), doTSNE = FALSE, doHeatmap = FALSE)
[1] "Converting input data.frame to a matrix"
[1] "Starting BFF"
[1] "rows dropped for low counts: 0 of 8"
[1] "Running BFF_raw"
[1] "Only one peak found, using max value as cutoff: BAL 17"
[1] "Only one peak found, using max value as cutoff: BAL 21"
[1] "Thresholds:"
[1] "BAL 24: 91.8521103309011"
[1] "BAL 23: 78.3116370807488"
[1] "BAL 22: 40.8624667015159"
[1] "BAL 21: 2928.84132708352"
[1] "BAL 20: 78.9682591903001"
[1] "BAL 19: 31.11498098035"
[1] "BAL 18: 90.8327820367026"
[1] "BAL 17: 203945.401462095"
[1] "Starting BFF"
[1] "rows dropped for low counts: 0 of 8"
[1] "Running BFF_cluster"
[1] "Doublet threshold: 0.05"
[1] "Neg threshold: 0.05"
[1] "Min distance as fraction of distance between peaks: 0.1"
[1] "Only one peak found, using max value as cutoff: BAL 17"
[1] "Only one peak found, using max value as cutoff: BAL 21"
[1] "Thresholds:"
[1] "BAL 24: 91.8521103309011"
[1] "BAL 23: 78.3116370807488"
[1] "BAL 22: 40.8624667015159"
[1] "BAL 21: 2928.84132708352"
[1] "BAL 20: 78.9682591903001"
[1] "BAL 19: 31.11498098035"
[1] "BAL 18: 90.8327820367026"
[1] "BAL 17: 203945.401462095"
[1] "Only one peak found, using max value as cutoff: BAL 17"
[1] "Only one peak found, using max value as cutoff: BAL 21"
[1] "Smoothing parameter j = 5"
[1] "Smoothing parameter j = 10"
[1] "Smoothing parameter j = 15"
[1] "Smoothing parameter j = 20"
[1] "Smoothing parameter j = 25"
[1] "Smoothing parameter j = 30"
[1] "Smoothing parameter j = 35"
[1] "Smoothing parameter j = 40"
[1] "Smoothing parameter j = 45"
[1] "Smoothing parameter j = 50"
[1] "Smoothing parameter j = 55"
[1] "Smoothing parameter j = 60"
[1] "Generating consensus calls"
[1] "Consensus calls will be generated using: bff_raw,bff_cluster"
[1] "Total concordant: 12770"
[1] "Total discordant: 18305 (58.91%)"
seu3_c2$BFF_raw_calls <- cellhashR_calls$bff_raw
seu3_c2$BFF_cluster_calls <- cellhashR_calls$bff_cluster
This has two flavours, one which takes into account the RNA library size and one which works just on the hashtag libraries. The latter doesn’t change the results much and requires the RNA counts matrix so we’ll just use the basic version.
This function turns the output of demuxmix into something consistent with the other methods
demuxmix_calls_consistent <- function(seurat_object, model = "naive", hto_list) {
hto_counts <- as.matrix(GetAssayData(seurat_object[["HTO"]], slot = "counts"))
dmm <- demuxmix(hto_counts, model = model)
dmm_calls <- dmmClassify(dmm)
calls_out <- case_when(dmm_calls$HTO %in% hto_list ~ dmm_calls$HTO,
!dmm_calls$HTO %in% hto_list ~ case_when(
dmm_calls$Type == "multiplet" ~ "Doublet",
dmm_calls$Type %in% c("negative", "uncertain") ~ "Negative")
)
return(as.factor(calls_out))
}
seu1_c1$demuxmix_calls <- demuxmix_calls_consistent(seu1_c1, hto_list = HTO_list_batch1)
seu1_c2$demuxmix_calls <- demuxmix_calls_consistent(seu1_c2, hto_list = HTO_list_batch1)
seu2_c1$demuxmix_calls <- demuxmix_calls_consistent(seu2_c1, hto_list = HTO_list_batch2)
seu2_c2$demuxmix_calls <- demuxmix_calls_consistent(seu2_c2, hto_list = HTO_list_batch2)
seu3_c1$demuxmix_calls <- demuxmix_calls_consistent(seu3_c1, hto_list = HTO_list_batch3)
seu3_c2$demuxmix_calls <- demuxmix_calls_consistent(seu3_c2, hto_list = HTO_list_batch3)
Merge the captures back into the batches
seu1 <- merge(seu1_c1, seu1_c2)
seu2 <- merge(seu2_c1, seu2_c2)
seu3 <- merge(seu3_c1, seu3_c2)
Add a factor for batches
seu1$Batch <- "Batch 1"
seu2$Batch <- "Batch 2"
seu3$Batch <- "Batch 3"
Save Seurat objects with all the hashtag assignments
#saveRDS(seu1, here("data", "BAL_data", "batch1_all_methods.SEU.rds"))
#saveRDS(seu2, here("data", "BAL_data", "batch2_all_methods.SEU.rds"))
#saveRDS(seu3, here("data", "BAL_data", "batch3_all_methods.SEU.rds"))
We’re looking at the fraction of droplets each method (including genetics) assigns as either singlet, doublet or negative (Figure 2).
method_calls <- c("hashedDrops_calls",
"hashedDrops_default_calls",
"hashsolo_calls",
"HTODemux_calls",
"GMMDemux_calls",
"deMULTIplex_calls",
"BFF_raw_calls",
"BFF_cluster_calls",
"demuxmix_calls")
for (method in method_calls){
seu1[[gsub(method, paste0(method, "_donors"), method)]] <- as.factor(unlist(HTO_list_batch1[unlist(seu1[[method]])]))
seu2[[gsub(method, paste0(method, "_donors"), method)]] <- as.factor(unlist(HTO_list_batch2[unlist(seu2[[method]])]))
seu3[[gsub(method, paste0(method, "_donors"), method)]] <- as.factor(unlist(HTO_list_batch3[unlist(seu3[[method]])]))
}
seu <- merge(seu1, c(seu2, seu3))
Warning in CheckDuplicateCellNames(object.list = objects): Some cell names are
duplicated across objects provided. Renaming to enforce unique cell names.
seu$Batch <- as.factor(seu$Batch)
Need to make a factor which reduces the assignments to one of (“Singlet”, “Doublet”, “Negative”).
sd_or_u_hashtags <- function(seurat_object, method) {
return(case_when(seurat_object[[method]] == "Doublet" ~ "Doublet",
seurat_object[[method]] == "Negative" ~ "Negative",
TRUE ~ "Singlet"))
}
for (method in method_calls) {
seu[[gsub(method, paste0(method, "_category"), method)]] <- sd_or_u_hashtags(seu, method)
}
seu$genetic_donor_category <- case_when(seu$genetic_donor == "Doublet" ~ "Doublet",
seu$genetic_donor == "Negative" ~ "Negative",
TRUE ~ "Singlet")
p1 <- dittoBarPlot(seu, var = "genetic_donor_category", group.by = "Batch") + NoLegend() +
ggtitle("vireo (genetics)") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_blank())
p2 <- dittoBarPlot(seu, var = "BFF_cluster_calls_category", group.by = "Batch") + NoLegend() +
ggtitle("BFF_cluster") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_blank(), axis.text.y = element_blank())
p3 <- dittoBarPlot(seu, var = "BFF_raw_calls_category", group.by = "Batch") + NoLegend() +
ggtitle("BFF_raw") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_blank())
p4 <- dittoBarPlot(seu, var = "deMULTIplex_calls_category", group.by = "Batch") +
ggtitle("deMULTIplex") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_blank(), axis.text.y = element_blank())
p5 <- dittoBarPlot(seu, var = "demuxmix_calls_category", group.by = "Batch") + NoLegend() +
ggtitle("demuxmix") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_blank())
p6 <- dittoBarPlot(seu, var = "GMMDemux_calls_category", group.by = "Batch") + NoLegend() +
ggtitle("GMM-Demux") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_blank(), axis.text.y = element_blank())
p7 <- dittoBarPlot(seu, var = "hashedDrops_calls_category", group.by = "Batch") + NoLegend() +
ggtitle("hashedDrops - best") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_blank())
p8 <- dittoBarPlot(seu, var = "hashedDrops_default_calls_category", group.by = "Batch") + NoLegend() +
ggtitle("hashedDrops - default") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_blank(), axis.text.y = element_blank())
p9 <- dittoBarPlot(seu, var = "hashsolo_calls_category", group.by = "Batch") + NoLegend() +
ggtitle("HashSolo") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_text(size = 14))
p10 <- dittoBarPlot(seu, var = "HTODemux_calls_category", group.by = "Batch") + NoLegend() +
ggtitle("HTODemux") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_text(size = 14), axis.text.y = element_blank())
(p1 + p2) / (p3 + p4) / (p5 + p6) / (p7 + p8) / (p9 + p10)
#ggsave("category_fractions.png",
# plot = (p1 + p2) / (p3 + p4) / (p5 + p6) / (p7 + p8) / (p9 + p10),
# device = "png",
# path = here("figures"),
# width = 8, height = 10,
# units = "in",
# dpi = 350)
We compute the F-score of each of the possible singlet assignments in each batch.
#Helper function
calculate_HTO_fscore <- function(seurat_object, donor_hto_list, method) {
calls <- seurat_object[[method]]
f <- NULL
for (HTO in donor_hto_list) {
tp <- sum(calls == HTO & donor_hto_list[seurat_object$genetic_donor] == HTO) #True positive rate
fp <- sum(calls == HTO & donor_hto_list[seurat_object$genetic_donor] != HTO) #False positive rate
fn <- sum(calls != HTO & donor_hto_list[seurat_object$genetic_donor] == HTO) #False negative rate
f <- c(f, tp / (tp + 0.5 * (fp + fn)))
}
f <- c(f, mean(f)) #Add mean F score
names(f) <- c(donor_hto_list, "Average")
return(f)
}
Fscore_hashedDrops_batch1 <- calculate_HTO_fscore(seu1, donor_list_batch1[1:8], "hashedDrops_calls")
Fscore_hashedDrops_batch2 <- calculate_HTO_fscore(seu2, donor_list_batch2[1:8], "hashedDrops_calls")
Fscore_hashedDrops_batch3 <- calculate_HTO_fscore(seu3, donor_list_batch3[1:8], "hashedDrops_calls")
Fscore_hashedDrops_default_batch1 <- calculate_HTO_fscore(seu1, donor_list_batch1[1:8], "hashedDrops_default_calls")
Fscore_hashedDrops_default_batch2 <- calculate_HTO_fscore(seu2, donor_list_batch2[1:8], "hashedDrops_default_calls")
Fscore_hashedDrops_default_batch3 <- calculate_HTO_fscore(seu3, donor_list_batch3[1:8], "hashedDrops_default_calls")
Fscore_hashsolo_batch1 <- calculate_HTO_fscore(seu1, donor_list_batch1[1:8], "hashsolo_calls")
Fscore_hashsolo_batch2 <- calculate_HTO_fscore(seu2, donor_list_batch2[1:8], "hashsolo_calls")
Fscore_hashsolo_batch3 <- calculate_HTO_fscore(seu3, donor_list_batch3[1:8], "hashsolo_calls")
Fscore_HTODemux_batch1 <- calculate_HTO_fscore(seu1, donor_list_batch1[1:8], "HTODemux_calls")
Fscore_HTODemux_batch2 <- calculate_HTO_fscore(seu2, donor_list_batch2[1:8], "HTODemux_calls")
Fscore_HTODemux_batch3 <- calculate_HTO_fscore(seu3, donor_list_batch3[1:8], "HTODemux_calls")
Fscore_GMMDemux_batch1 <- calculate_HTO_fscore(seu1, donor_list_batch1[1:8], "GMMDemux_calls")
Fscore_GMMDemux_batch2 <- calculate_HTO_fscore(seu2, donor_list_batch2[1:8], "GMMDemux_calls")
Fscore_GMMDemux_batch3 <- calculate_HTO_fscore(seu3, donor_list_batch3[1:8], "GMMDemux_calls")
Fscore_deMULTIplex_batch1 <- calculate_HTO_fscore(seu1, donor_list_batch1[1:8], "deMULTIplex_calls")
Fscore_deMULTIplex_batch2 <- calculate_HTO_fscore(seu2, donor_list_batch2[1:8], "deMULTIplex_calls")
Fscore_deMULTIplex_batch3 <- calculate_HTO_fscore(seu3, donor_list_batch3[1:8], "deMULTIplex_calls")
Fscore_BFF_raw_batch1 <- calculate_HTO_fscore(seu1, donor_list_batch1[1:8], "BFF_raw_calls")
Fscore_BFF_raw_batch2 <- calculate_HTO_fscore(seu2, donor_list_batch2[1:8], "BFF_raw_calls")
Fscore_BFF_raw_batch3 <- calculate_HTO_fscore(seu3, donor_list_batch3[1:8], "BFF_raw_calls")
Fscore_BFF_cluster_batch1 <- calculate_HTO_fscore(seu1, donor_list_batch1[1:8], "BFF_cluster_calls")
Fscore_BFF_cluster_batch2 <- calculate_HTO_fscore(seu2, donor_list_batch2[1:8], "BFF_cluster_calls")
Fscore_BFF_cluster_batch3 <- calculate_HTO_fscore(seu3, donor_list_batch3[1:8], "BFF_cluster_calls")
Fscore_demuxmix_batch1 <- calculate_HTO_fscore(seu1, donor_list_batch1[1:8], "demuxmix_calls")
Fscore_demuxmix_batch2 <- calculate_HTO_fscore(seu2, donor_list_batch2[1:8], "demuxmix_calls")
Fscore_demuxmix_batch3 <- calculate_HTO_fscore(seu3, donor_list_batch3[1:8], "demuxmix_calls")
Put it all together for visualisation
Fscore_matrix_batch1 <- data.frame("HTO" = c(HTO_list_batch1[1:8], "Mean (Batch 1)"),#"Median"),
"hashedDrops" = Fscore_hashedDrops_batch1,
"hashedDrops_default" = Fscore_hashedDrops_default_batch1,
"HashSolo" = Fscore_hashsolo_batch1,
"HTODemux" = Fscore_HTODemux_batch1,
"GMM_Demux" = Fscore_GMMDemux_batch1,
"deMULTIplex" = Fscore_deMULTIplex_batch1,
"BFF_raw" = Fscore_BFF_raw_batch1,
"BFF_cluster" = Fscore_BFF_cluster_batch1,
"demuxmix" = Fscore_demuxmix_batch1) %>%
pivot_longer(cols = c("hashedDrops", "hashedDrops_default", "HashSolo",
"HTODemux", "GMM_Demux", "deMULTIplex",
"BFF_raw", "BFF_cluster", "demuxmix"),
names_to = "method",
values_to = "Fscore")
Fscore_matrix_batch2 <- data.frame("HTO" = c(HTO_list_batch2[1:8], "Mean (Batch 2)"),#"Median"),
"hashedDrops" = Fscore_hashedDrops_batch2,
"hashedDrops_default" = Fscore_hashedDrops_default_batch2,
"HashSolo" = Fscore_hashsolo_batch2,
"HTODemux" = Fscore_HTODemux_batch2,
"GMM_Demux" = Fscore_GMMDemux_batch2,
"deMULTIplex" = Fscore_deMULTIplex_batch2,
"BFF_raw" = Fscore_BFF_raw_batch2,
"BFF_cluster" = Fscore_BFF_cluster_batch2,
"demuxmix" = Fscore_demuxmix_batch2) %>%
pivot_longer(cols = c("hashedDrops", "hashedDrops_default", "HashSolo",
"HTODemux", "GMM_Demux", "deMULTIplex",
"BFF_raw", "BFF_cluster", "demuxmix"),
names_to = "method",
values_to = "Fscore")
Fscore_matrix_batch3 <- data.frame("HTO" = c(HTO_list_batch3[1:8], "Mean (Batch 3)"),# "Median"),
"hashedDrops" = Fscore_hashedDrops_batch3,
"hashedDrops_default" = Fscore_hashedDrops_default_batch3,
"HashSolo" = Fscore_hashsolo_batch3,
"HTODemux" = Fscore_HTODemux_batch3,
"GMM_Demux" = Fscore_GMMDemux_batch3,
"deMULTIplex" = Fscore_deMULTIplex_batch3,
"BFF_raw" = Fscore_BFF_raw_batch3,
"BFF_cluster" = Fscore_BFF_cluster_batch3,
"demuxmix" = Fscore_demuxmix_batch3) %>%
pivot_longer(cols = c("hashedDrops", "hashedDrops_default", "HashSolo",
"HTODemux", "GMM_Demux", "deMULTIplex",
"BFF_raw", "BFF_cluster", "demuxmix"),
names_to = "method",
values_to = "Fscore")
It’ll be a bit easier to comprehend all of this if we can put the above plots into a heatmap. Rows are methods, columns are samples, colour scale is F score
Fscore_matrix_batch1$Batch <- "Batch 1"
Fscore_matrix_batch2$Batch <- "Batch 2"
Fscore_matrix_batch3$Batch <- "Batch 3"
Fscore_matrix <- bind_rows(Fscore_matrix_batch1,
Fscore_matrix_batch2,
Fscore_matrix_batch3)
Fscore_matrix <- Fscore_matrix[!Fscore_matrix$HTO %in% c("Mean (Batch 1)", "Mean (Batch 2)", "Mean (Batch 3)"), ]
Fscore_matrix %>%
group_by(Batch) %>%
heatmap(.row = method,
.column = HTO,
.value = Fscore,
column_title = "F-score - BAL data",
cluster_rows = TRUE,
row_names_gp = gpar(fontsize = 10),
show_row_dend = FALSE,
row_names_side = "left",
row_title = "",
cluster_columns = FALSE,
column_names_gp = gpar(fontsize = 10),
palette_value = plasma(3)) %>%
wrap_heatmap() -> p1
tidyHeatmap says: (once per session) from release 1.7.0 the scaling is set to "none" by default. Please use scale = "row", "column" or "both" to apply scaling
p1
#ggsave(here("paper_latex", "figures", "fscore_heatmap.png"),
# p1,
# device = "png",
# width = 8, height = 5,
# units = "in",
# dpi = 350
# )
Get numerical values out.
Fscore_matrix_batch1[Fscore_matrix_batch1$HTO == "Mean", ]
# A tibble: 0 × 4
# ℹ 4 variables: HTO <chr>, method <chr>, Fscore <dbl>, Batch <chr>
Fscore_matrix_batch2[Fscore_matrix_batch2$HTO == "Mean", ]
# A tibble: 0 × 4
# ℹ 4 variables: HTO <chr>, method <chr>, Fscore <dbl>, Batch <chr>
Fscore_matrix_batch3[Fscore_matrix_batch3$HTO == "Mean", ]
# A tibble: 0 × 4
# ℹ 4 variables: HTO <chr>, method <chr>, Fscore <dbl>, Batch <chr>
Also worried about whether any of these methods are assigning genetic doublets as singlets. Going to look at the fraction of genetic doublets that get assigned to each of the possible HTO categories.
batch1_doublets <- seu1[, seu1$genetic_donor == "Doublet"]
batch2_doublets <- seu2[, seu2$genetic_donor == "Doublet"]
batch3_doublets <- seu3[, seu3$genetic_donor == "Doublet"]
batch1_doublet_doublet <- NULL
batch1_doublet_negative <- NULL
batch1_doublet_singlet <- NULL
batch2_doublet_doublet <- NULL
batch2_doublet_negative <- NULL
batch2_doublet_singlet <- NULL
batch3_doublet_doublet <- NULL
batch3_doublet_negative <- NULL
batch3_doublet_singlet <- NULL
for (method in method_calls) {
batch1_doublet_doublet <- c(batch1_doublet_doublet, sum(seu1$genetic_donor == "Doublet" & seu1[[method]] == "Doublet") / sum(seu1$genetic_donor == "Doublet"))
batch1_doublet_negative <- c(batch1_doublet_negative, sum(seu1$genetic_donor == "Doublet" & seu1[[method]] == "Negative") / sum(seu1$genetic_donor == "Doublet"))
batch1_doublet_singlet <- c(batch1_doublet_singlet, sum(seu1$genetic_donor == "Doublet" & Reduce("|", lapply(HTO_list_batch1[1:8], function(x) seu1[[method]] == x))) / sum(seu1$genetic_donor == "Doublet"))
batch2_doublet_doublet <- c(batch2_doublet_doublet, sum(seu2$genetic_donor == "Doublet" & seu2[[method]] == "Doublet") / sum(seu2$genetic_donor == "Doublet"))
batch2_doublet_negative <- c(batch2_doublet_negative, sum(seu2$genetic_donor == "Doublet" & seu2[[method]] == "Negative") / sum(seu2$genetic_donor == "Doublet"))
batch2_doublet_singlet <- c(batch2_doublet_singlet, sum(seu2$genetic_donor == "Doublet" & Reduce("|", lapply(HTO_list_batch2[1:8], function(x) seu2[[method]] == x))) / sum(seu2$genetic_donor == "Doublet"))
batch3_doublet_doublet <- c(batch3_doublet_doublet, sum(seu3$genetic_donor == "Doublet" & seu3[[method]] == "Doublet") / sum(seu3$genetic_donor == "Doublet"))
batch3_doublet_negative <- c(batch3_doublet_negative, sum(seu3$genetic_donor == "Doublet" & seu3[[method]] == "Negative") / sum(seu3$genetic_donor == "Doublet"))
batch3_doublet_singlet <- c(batch3_doublet_singlet, sum(seu3$genetic_donor == "Doublet" & Reduce("|", lapply(HTO_list_batch3[1:8], function(x) seu3[[method]] == x))) / sum(seu3$genetic_donor == "Doublet"))
}
names(batch1_doublet_doublet) <- method_calls
names(batch1_doublet_negative) <- method_calls
names(batch1_doublet_singlet) <- method_calls
names(batch2_doublet_doublet) <- method_calls
names(batch2_doublet_negative) <- method_calls
names(batch2_doublet_singlet) <- method_calls
names(batch3_doublet_doublet) <- method_calls
names(batch3_doublet_negative) <- method_calls
names(batch3_doublet_singlet) <- method_calls
batch1_doublet_assignments <- data.frame("method" = c("hashedDrops",
"hashedDrops (default)",
"HashSolo",
"HTODemux",
"GMM-Demux",
"deMULTIplex",
"BFF_raw",
"BFF_cluster",
"demuxmix"),
"Doublet" = batch1_doublet_doublet,
"Negative" = batch1_doublet_negative,
"Singlet" = batch1_doublet_singlet) %>%
pivot_longer(cols = c("Doublet", "Negative", "Singlet"),
names_to = "assignment",
values_to = "fraction")
batch2_doublet_assignments <- data.frame("method" = c("hashedDrops",
"hashedDrops (default)",
"HashSolo",
"HTODemux",
"GMM-Demux",
"deMULTIplex",
"BFF_raw",
"BFF_cluster",
"demuxmix"),
"Doublet" = batch2_doublet_doublet,
"Negative" = batch2_doublet_negative,
"Singlet" = batch2_doublet_singlet) %>%
pivot_longer(cols = c("Doublet", "Negative", "Singlet"),
names_to = "assignment",
values_to = "fraction")
batch3_doublet_assignments <- data.frame("method" = c("hashedDrops",
"hashedDrops (default)",
"HashSolo",
"HTODemux",
"GMM-Demux",
"deMULTIplex",
"BFF_raw",
"BFF_cluster",
"demuxmix"),
"Doublet" = batch3_doublet_doublet,
"Negative" = batch3_doublet_negative,
"Singlet" = batch3_doublet_singlet) %>%
pivot_longer(cols = c("Doublet", "Negative", "Singlet"),
names_to = "assignment",
values_to = "fraction")
doublet_colours <- c("black", "gray60", "firebrick1")
p1 <- ggplot(batch1_doublet_assignments %>%
mutate(method = factor(method, levels = c("BFF_cluster", "BFF_raw", "hashedDrops (default)", "deMULTIplex", "HashSolo", "GMM-Demux", "HTODemux", "demuxmix", "hashedDrops")))) +
geom_bar(aes(x = method, y = fraction, fill = assignment),
stat = "identity") +
ggtitle("Batch 1 (2041 doublets)") +
ylim(0, 1) +
scale_fill_manual(values = doublet_colours) +
theme(axis.ticks.x = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
plot.title = element_text(size = 10)
) + NoLegend() + coord_flip()
p2 <- ggplot(batch2_doublet_assignments %>%
mutate(method = factor(method, levels = c("BFF_cluster", "BFF_raw", "hashedDrops (default)", "deMULTIplex", "HashSolo", "GMM-Demux", "HTODemux", "demuxmix", "hashedDrops")))) +
geom_bar(aes(x = method, y = fraction, fill = assignment),
stat = "identity") +
ggtitle("Batch 2 (6463 doublets)") +
ylim(0, 1) +
scale_fill_manual(values = doublet_colours) +
theme(axis.ticks.x = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank(),
plot.title = element_text(size = 10)) + NoLegend() + coord_flip()
p3 <- ggplot(batch3_doublet_assignments %>%
mutate(method = factor(method, levels = c("BFF_cluster", "BFF_raw", "hashedDrops (default)", "deMULTIplex", "HashSolo", "GMM-Demux", "HTODemux", "demuxmix", "hashedDrops")))) +
geom_bar(aes(x = method, y = fraction, fill = assignment),
stat = "identity") +
ggtitle("Batch 3 (11004 doublets)") +
ylim(0, 1) +
scale_fill_manual(values = doublet_colours) +
theme(axis.ticks.x = element_blank(),
axis.text.y = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
plot.title = element_text(size = 10)) + coord_flip()
p1 | p2 | p3
#ggsave("doublet_assignments.png",
# plot = p1 | p2 | p3,
# device = "png",
# path = here("paper_latex", "figures"),
# width = 8, height = 5,
# units = "in",
# dpi = 350)
sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.0.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] Matrix_1.5-4 viridis_0.6.3
[3] viridisLite_0.4.2 tidyHeatmap_1.8.1
[5] ComplexHeatmap_2.14.0 demuxmix_1.0.0
[7] RColorBrewer_1.1-3 cellhashR_1.0.3
[9] dittoSeq_1.10.0 speckle_0.99.7
[11] pheatmap_1.0.12 SeuratObject_4.1.3
[13] Seurat_4.3.0 scater_1.26.1
[15] scuttle_1.8.4 lubridate_1.9.2
[17] forcats_1.0.0 stringr_1.5.0
[19] purrr_1.0.1 readr_2.1.4
[21] tidyr_1.3.0 tibble_3.2.1
[23] tidyverse_2.0.0 DropletUtils_1.18.1
[25] SingleCellExperiment_1.20.1 SummarizedExperiment_1.28.0
[27] Biobase_2.58.0 GenomicRanges_1.50.2
[29] GenomeInfoDb_1.34.9 IRanges_2.32.0
[31] S4Vectors_0.36.2 BiocGenerics_0.44.0
[33] MatrixGenerics_1.10.0 matrixStats_0.63.0
[35] patchwork_1.1.2 cowplot_1.1.1
[37] ggplot2_3.4.2 janitor_2.2.0
[39] dplyr_1.1.2 BiocStyle_2.26.0
[41] here_1.0.1 workflowr_1.7.0
loaded via a namespace (and not attached):
[1] scattermore_1.0 R.methodsS3_1.8.2
[3] knitr_1.42 irlba_2.3.5.1
[5] DelayedArray_0.24.0 R.utils_2.12.2
[7] data.table_1.14.8 RCurl_1.98-1.12
[9] doParallel_1.0.17 generics_0.1.3
[11] preprocessCore_1.61.0 ScaledMatrix_1.6.0
[13] callr_3.7.3 RANN_2.6.1
[15] future_1.32.0 tzdb_0.3.0
[17] spatstat.data_3.0-1 httpuv_1.6.11
[19] xfun_0.39 hms_1.1.3
[21] jquerylib_0.1.4 evaluate_0.21
[23] promises_1.2.0.1 fansi_1.0.4
[25] dendextend_1.17.1 igraph_1.4.2
[27] DBI_1.1.3 htmlwidgets_1.6.2
[29] spatstat.geom_3.2-1 ellipsis_0.3.2
[31] bookdown_0.34 deldir_1.0-6
[33] sparseMatrixStats_1.10.0 vctrs_0.6.2
[35] Cairo_1.6-0 ROCR_1.0-11
[37] abind_1.4-5 cachem_1.0.8
[39] withr_2.5.0 ggforce_0.4.1
[41] progressr_0.13.0 sctransform_0.3.5
[43] goftest_1.2-3 cluster_2.1.4
[45] lazyeval_0.2.2 crayon_1.5.2
[47] spatstat.explore_3.2-1 edgeR_3.40.2
[49] pkgconfig_2.0.3 labeling_0.4.2
[51] tweenr_2.0.2 nlme_3.1-162
[53] vipor_0.4.5 rlang_1.1.1
[55] globals_0.16.2 lifecycle_1.0.3
[57] miniUI_0.1.1.1 rsvd_1.0.5
[59] ggrastr_1.0.1 rprojroot_2.0.3
[61] polyclip_1.10-4 lmtest_0.9-40
[63] Rhdf5lib_1.20.0 zoo_1.8-12
[65] beeswarm_0.4.0 whisker_0.4.1
[67] ggridges_0.5.4 GlobalOptions_0.1.2
[69] processx_3.8.1 png_0.1-8
[71] rjson_0.2.21 bitops_1.0-7
[73] getPass_0.2-2 R.oo_1.25.0
[75] KernSmooth_2.23-21 rhdf5filters_1.10.1
[77] ggExtra_0.10.0 DelayedMatrixStats_1.20.0
[79] shape_1.4.6 parallelly_1.35.0
[81] spatstat.random_3.1-5 beachmat_2.14.2
[83] scales_1.2.1 magrittr_2.0.3
[85] plyr_1.8.8 ica_1.0-3
[87] zlibbioc_1.44.0 compiler_4.2.2
[89] dqrng_0.3.0 clue_0.3-64
[91] fitdistrplus_1.1-11 snakecase_0.11.0
[93] cli_3.6.1 XVector_0.38.0
[95] listenv_0.9.0 pbapply_1.7-0
[97] ps_1.7.5 MASS_7.3-60
[99] tidyselect_1.2.0 stringi_1.7.12
[101] highr_0.10 yaml_2.3.7
[103] BiocSingular_1.14.0 locfit_1.5-9.7
[105] ggrepel_0.9.3 sass_0.4.6
[107] tools_4.2.2 timechange_0.2.0
[109] future.apply_1.10.0 parallel_4.2.2
[111] circlize_0.4.15 rstudioapi_0.14
[113] foreach_1.5.2 git2r_0.32.0
[115] gridExtra_2.3 rmdformats_1.0.4
[117] farver_2.1.1 Rtsne_0.16
[119] digest_0.6.31 BiocManager_1.30.20
[121] shiny_1.7.4 Rcpp_1.0.10
[123] egg_0.4.5 later_1.3.1
[125] RcppAnnoy_0.0.20 httr_1.4.6
[127] naturalsort_0.1.3 colorspace_2.1-0
[129] fs_1.6.2 tensor_1.5
[131] reticulate_1.28 splines_4.2.2
[133] uwot_0.1.14 spatstat.utils_3.0-3
[135] sp_1.6-0 plotly_4.10.1
[137] xtable_1.8-4 jsonlite_1.8.4
[139] R6_2.5.1 pillar_1.9.0
[141] htmltools_0.5.5 mime_0.12
[143] glue_1.6.2 fastmap_1.1.1
[145] BiocParallel_1.32.6 BiocNeighbors_1.16.0
[147] codetools_0.2-19 utf8_1.2.3
[149] lattice_0.21-8 bslib_0.4.2
[151] spatstat.sparse_3.0-1 ggbeeswarm_0.7.2
[153] leiden_0.4.3 magick_2.7.4
[155] survival_3.5-5 limma_3.54.2
[157] rmarkdown_2.21 munsell_0.5.0
[159] GetoptLong_1.0.5 rhdf5_2.42.1
[161] GenomeInfoDbData_1.2.9 iterators_1.0.14
[163] HDF5Array_1.26.0 reshape2_1.4.4
[165] gtable_0.3.3