Last updated: 2023-07-05
Checks: 7 0
Knit directory: hashtag-demux-paper/
This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20230522)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 9e9164b. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/figure/
Ignored: data/BAL_data/.DS_Store
Untracked files:
Untracked: .ipynb_checkpoints/
Untracked: Dear editors.docx
Untracked: NAR.cls
Untracked: analysis/Fscore_MCC_comparison.Rmd
Untracked: analysis/cell_line_analysis.Rmd
Untracked: analysis/gmm_demux.Rmd
Untracked: analysis/hashsolo_prep.Rmd
Untracked: analysis/run_GMM_demux_BAL.sh
Untracked: cover_letter.docx
Untracked: data/BAL_data/batch1_all_methods.SEU.rds
Untracked: data/BAL_data/batch1_c1_donors_original.csv
Untracked: data/BAL_data/batch1_c1_hto_counts_original.csv
Untracked: data/BAL_data/batch1_c1_relabelled.SEU.rds
Untracked: data/BAL_data/batch1_c2_donors_original.csv
Untracked: data/BAL_data/batch1_c2_hto_counts_original.csv
Untracked: data/BAL_data/batch1_c2_relabelled.SEU.rds
Untracked: data/BAL_data/batch1_relabelled.SEU.rds
Untracked: data/BAL_data/batch2_all_methods.SEU.rds
Untracked: data/BAL_data/batch2_c1_donors_original.csv
Untracked: data/BAL_data/batch2_c1_hto_counts_original.csv
Untracked: data/BAL_data/batch2_c1_relabelled.SEU.rds
Untracked: data/BAL_data/batch2_c2_donors_original.csv
Untracked: data/BAL_data/batch2_c2_hto_counts_original.csv
Untracked: data/BAL_data/batch2_c2_relabelled.SEU.rds
Untracked: data/BAL_data/batch2_relabelled.SEU.rds
Untracked: data/BAL_data/batch3_all_methods.SEU.rds
Untracked: data/BAL_data/batch3_c1_donors_original.csv
Untracked: data/BAL_data/batch3_c1_hto_counts_original.csv
Untracked: data/BAL_data/batch3_c1_relabelled.SEU.rds
Untracked: data/BAL_data/batch3_c2_donors_original.csv
Untracked: data/BAL_data/batch3_c2_hto_counts_original.csv
Untracked: data/BAL_data/batch3_c2_relabelled.SEU.rds
Untracked: data/BAL_data/batch3_relabelled.SEU.rds
Untracked: data/adata/batch1_c1_hashsolo.csv
Untracked: data/adata/batch1_c1_hs_n10_d10.csv
Untracked: data/adata/batch1_c1_hs_n10_d20.csv
Untracked: data/adata/batch1_c1_hs_n10_d30.csv
Untracked: data/adata/batch1_c1_hs_n1_d10.csv
Untracked: data/adata/batch1_c1_hs_n1_d20.csv
Untracked: data/adata/batch1_c1_hs_n1_d30.csv
Untracked: data/adata/batch1_c1_hs_n5_d10.csv
Untracked: data/adata/batch1_c1_hs_n5_d30.csv
Untracked: data/adata/batch1_c2_hashsolo.csv
Untracked: data/adata/batch1_c2_hs_n10_d10.csv
Untracked: data/adata/batch1_c2_hs_n10_d20.csv
Untracked: data/adata/batch1_c2_hs_n10_d30.csv
Untracked: data/adata/batch1_c2_hs_n1_d10.csv
Untracked: data/adata/batch1_c2_hs_n1_d20.csv
Untracked: data/adata/batch1_c2_hs_n1_d30.csv
Untracked: data/adata/batch1_c2_hs_n5_d10.csv
Untracked: data/adata/batch1_c2_hs_n5_d30.csv
Untracked: data/adata/batch2_c1_hashsolo.csv
Untracked: data/adata/batch2_c1_hs_n10_d10.csv
Untracked: data/adata/batch2_c1_hs_n10_d20.csv
Untracked: data/adata/batch2_c1_hs_n10_d30.csv
Untracked: data/adata/batch2_c1_hs_n1_d10.csv
Untracked: data/adata/batch2_c1_hs_n1_d20.csv
Untracked: data/adata/batch2_c1_hs_n1_d30.csv
Untracked: data/adata/batch2_c1_hs_n5_d10.csv
Untracked: data/adata/batch2_c1_hs_n5_d30.csv
Untracked: data/adata/batch2_c2_hashsolo.csv
Untracked: data/adata/batch2_c2_hs_n10_d10.csv
Untracked: data/adata/batch2_c2_hs_n10_d20.csv
Untracked: data/adata/batch2_c2_hs_n10_d30.csv
Untracked: data/adata/batch2_c2_hs_n1_d10.csv
Untracked: data/adata/batch2_c2_hs_n1_d20.csv
Untracked: data/adata/batch2_c2_hs_n1_d30.csv
Untracked: data/adata/batch2_c2_hs_n5_d10.csv
Untracked: data/adata/batch2_c2_hs_n5_d30.csv
Untracked: data/adata/batch3_c1_hashsolo.csv
Untracked: data/adata/batch3_c1_hs_n10_d10.csv
Untracked: data/adata/batch3_c1_hs_n10_d20.csv
Untracked: data/adata/batch3_c1_hs_n10_d30.csv
Untracked: data/adata/batch3_c1_hs_n1_d10.csv
Untracked: data/adata/batch3_c1_hs_n1_d20.csv
Untracked: data/adata/batch3_c1_hs_n1_d30.csv
Untracked: data/adata/batch3_c1_hs_n5_d10.csv
Untracked: data/adata/batch3_c1_hs_n5_d30.csv
Untracked: data/adata/batch3_c2_hashsolo.csv
Untracked: data/adata/batch3_c2_hs_n10_d10.csv
Untracked: data/adata/batch3_c2_hs_n10_d20.csv
Untracked: data/adata/batch3_c2_hs_n10_d30.csv
Untracked: data/adata/batch3_c2_hs_n1_d10.csv
Untracked: data/adata/batch3_c2_hs_n1_d20.csv
Untracked: data/adata/batch3_c2_hs_n1_d30.csv
Untracked: data/adata/batch3_c2_hs_n5_d10.csv
Untracked: data/adata/batch3_c2_hs_n5_d30.csv
Untracked: data/adata/solid_tissue_batch1_hashsolo.csv
Untracked: data/adata/solid_tissue_batch2_hashsolo.csv
Untracked: data/solid_tumor_data/
Untracked: figures/QC_plots_new.png
Untracked: figures/Users/
Untracked: filter_wrong_empties.Rmd
Untracked: iscb_long_abstract.docx
Untracked: iscb_long_abstract.pdf
Untracked: oup-authoring-template/
Untracked: output/mean_fscore_mcc.xlsx
Untracked: paper_latex/
Unstaged changes:
Modified: analysis/index.Rmd
Modified: data/.DS_Store
Deleted: data/GMM-Demux/SSD_mtx/barcodes.tsv.gz
Deleted: data/GMM-Demux/SSD_mtx/features.tsv.gz
Deleted: data/GMM-Demux/SSD_mtx/matrix.mtx.gz
Deleted: data/GMM-Demux/batch1_c1_hto_counts_transpose.csv
Deleted: data/GMM-Demux/batch1_c2_hto_counts_transpose.csv
Deleted: data/GMM-Demux/batch2_c1_hto_counts_transpose.csv
Deleted: data/GMM-Demux/batch2_c2_hto_counts_transpose.csv
Deleted: data/GMM-Demux/batch3_c1_hto_counts_transpose.csv
Deleted: data/GMM-Demux/batch3_c2_hto_counts_transpose.csv
Deleted: data/GMM-Demux/gmm_out_LMO_c1/full_report/GMM_full.config
Deleted: data/GMM-Demux/gmm_out_LMO_c1/full_report/GMM_full.csv
Deleted: data/GMM-Demux/gmm_out_LMO_c1/simplified_report/GMM_simplified.config
Deleted: data/GMM-Demux/gmm_out_LMO_c1/simplified_report/GMM_simplified.csv
Deleted: data/GMM-Demux/gmm_out_LMO_c2/full_report/GMM_full.config
Deleted: data/GMM-Demux/gmm_out_LMO_c2/full_report/GMM_full.csv
Deleted: data/GMM-Demux/gmm_out_LMO_c2/simplified_report/GMM_simplified.config
Deleted: data/GMM-Demux/gmm_out_LMO_c2/simplified_report/GMM_simplified.csv
Deleted: data/GMM-Demux/gmm_out_LMO_c3/full_report/GMM_full.config
Deleted: data/GMM-Demux/gmm_out_LMO_c3/full_report/GMM_full.csv
Deleted: data/GMM-Demux/gmm_out_LMO_c3/simplified_report/GMM_simplified.config
Deleted: data/GMM-Demux/gmm_out_LMO_c3/simplified_report/GMM_simplified.csv
Deleted: data/GMM-Demux/gmm_out_batch1_c1/full_report/GMM_full.config
Deleted: data/GMM-Demux/gmm_out_batch1_c1/full_report/GMM_full.csv
Deleted: data/GMM-Demux/gmm_out_batch1_c1/simplified_report/GMM_simplified.config
Deleted: data/GMM-Demux/gmm_out_batch1_c1/simplified_report/GMM_simplified.csv
Deleted: data/GMM-Demux/gmm_out_batch1_c2/full_report/GMM_full.config
Deleted: data/GMM-Demux/gmm_out_batch1_c2/full_report/GMM_full.csv
Deleted: data/GMM-Demux/gmm_out_batch1_c2/simplified_report/GMM_simplified.config
Deleted: data/GMM-Demux/gmm_out_batch1_c2/simplified_report/GMM_simplified.csv
Deleted: data/GMM-Demux/gmm_out_batch2_c1/full_report/GMM_full.config
Deleted: data/GMM-Demux/gmm_out_batch2_c1/full_report/GMM_full.csv
Deleted: data/GMM-Demux/gmm_out_batch2_c1/simplified_report/GMM_simplified.config
Deleted: data/GMM-Demux/gmm_out_batch2_c1/simplified_report/GMM_simplified.csv
Deleted: data/GMM-Demux/gmm_out_batch2_c2/full_report/GMM_full.config
Deleted: data/GMM-Demux/gmm_out_batch2_c2/full_report/GMM_full.csv
Deleted: data/GMM-Demux/gmm_out_batch2_c2/simplified_report/GMM_simplified.config
Deleted: data/GMM-Demux/gmm_out_batch2_c2/simplified_report/GMM_simplified.csv
Deleted: data/GMM-Demux/gmm_out_batch3_c1/full_report/GMM_full.config
Deleted: data/GMM-Demux/gmm_out_batch3_c1/full_report/GMM_full.csv
Deleted: data/GMM-Demux/gmm_out_batch3_c1/simplified_report/GMM_simplified.config
Deleted: data/GMM-Demux/gmm_out_batch3_c1/simplified_report/GMM_simplified.csv
Deleted: data/GMM-Demux/gmm_out_batch3_c2/full_report/GMM_full.config
Deleted: data/GMM-Demux/gmm_out_batch3_c2/full_report/GMM_full.csv
Deleted: data/GMM-Demux/gmm_out_batch3_c2/simplified_report/GMM_simplified.config
Deleted: data/GMM-Demux/gmm_out_batch3_c2/simplified_report/GMM_simplified.csv
Deleted: data/GMM-Demux/lmo_counts_capture1_transpose.csv
Deleted: data/GMM-Demux/lmo_counts_capture2_transpose.csv
Deleted: data/GMM-Demux/lmo_counts_capture3_transpose.csv
Deleted: data/GMM-Demux/run_GMM_demux_BAL.sh
Deleted: data/GMM-Demux/run_GMM_demux_LMO.sh
Modified: data/adata/batch1_HTOs.csv
Modified: data/adata/batch1_c1_barcodes.csv
Modified: data/adata/batch1_c1_counts.mtx
Modified: data/adata/batch1_c2_barcodes.csv
Modified: data/adata/batch1_c2_counts.mtx
Modified: data/adata/batch2_HTOs.csv
Modified: data/adata/batch2_c1_barcodes.csv
Modified: data/adata/batch2_c1_counts.mtx
Modified: data/adata/batch2_c2_barcodes.csv
Modified: data/adata/batch2_c2_counts.mtx
Modified: data/adata/batch3_HTOs.csv
Modified: data/adata/batch3_c1_barcodes.csv
Modified: data/adata/batch3_c1_counts.mtx
Modified: data/adata/batch3_c2_barcodes.csv
Modified: data/adata/batch3_c2_counts.mtx
Deleted: data/batch1_c1_donors.csv
Deleted: data/batch1_c1_hto_counts.csv
Deleted: data/batch1_c2_donors.csv
Deleted: data/batch1_c2_hto_counts.csv
Deleted: data/batch1_hto_counts.csv
Deleted: data/batch2_c1_donors.csv
Deleted: data/batch2_c1_hto_counts.csv
Deleted: data/batch2_c2_donors.csv
Deleted: data/batch2_c2_hto_counts.csv
Deleted: data/batch2_hto_counts.csv
Deleted: data/batch3_c1_donors.csv
Deleted: data/batch3_c1_hto_counts.csv
Deleted: data/batch3_c2_donors.csv
Deleted: data/batch3_c2_hto_counts.csv
Deleted: data/batch3_hto_counts.csv
Deleted: data/lmo_counts.csv
Deleted: data/lmo_counts_capture1.csv
Deleted: data/lmo_counts_capture2.csv
Deleted: data/lmo_counts_capture3.csv
Deleted: data/lmo_donors.csv
Deleted: data/lmo_donors_capture1.csv
Deleted: data/lmo_donors_capture2.csv
Deleted: data/lmo_donors_capture3.csv
Modified: figures/QC_plots.png
Modified: figures/category_fractions.png
Modified: hashsolo_calls.ipynb
Modified: notebook_for_paper.Rmd
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/solid_tissue_analysis.Rmd
)
and HTML (docs/solid_tissue_analysis.html
) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote
),
click on the hyperlinks in the table below to view the files as they
were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 9e9164b | George Howitt | 2023-07-05 | wflow_publish("analysis/solid_tissue_analysis.Rmd") |
#Load libraries
suppressPackageStartupMessages({
library(here)
library(BiocStyle)
library(dplyr)
library(janitor)
library(ggplot2)
library(cowplot)
library(patchwork)
library(DropletUtils)
library(tidyverse)
library(scuttle)
library(scater)
library(Seurat)
library(pheatmap)
library(speckle)
library(dittoSeq)
library(cellhashR)
library(RColorBrewer)
library(demuxmix)
library(ComplexHeatmap)
library(tidyHeatmap)
library(viridis)
})
This notebook contains the analysis code for all results and figures relating to the cell line data set in the paper “Benchmarking single-cell hashtag oligo demultiplexing methods”.
This data consists of eight samples where scRNA-seq was performed in two batches of four samples each. on ovarian tumours.
Data is available here https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002262.v2.p1&phv=436020&phd=&pha=&pht=10575&phvf=&phdf=&phaf=&phtf=&dssp=1&consent=&temp=1
hashtag_counts_b1 <- read.csv(here("data", "solid_tumor_data", "ovarian_tumour_b1.csv"),
check.names = FALSE, row.names = 1)
hashtag_counts_b2 <- read.csv(here("data", "solid_tumor_data", "ovarian_tumour_b2.csv"),
check.names = FALSE, row.names = 1)
donors_b1 <- read.csv(here("data", "solid_tumor_data", "ovarian_tumour_donors_b1.csv"),
row.names = 1)
donors_b2 <- read.csv(here("data", "solid_tumor_data", "ovarian_tumour_donors_b2.csv"),
row.names = 1)
seu_b1 <- CreateSeuratObject(counts = hashtag_counts_b1, assay = "HTO")
seu_b2 <- CreateSeuratObject(counts = hashtag_counts_b2, assay = "HTO")
seu_b1$Barcode <- colnames(seu_b1)
seu_b2$Barcode <- colnames(seu_b2)
Add genetic donor information to Seurat objects
seu_b1$genetic_donor <- donors_b1$genetic_donors
seu_b2$genetic_donor <- donors_b2$genetic_donors
Useful lists
hashtag_list_b1 <- c("OT 01",
"OT 02",
"OT 03",
"OT 04")
donor_hashtag_list_b1 <- list("OT A" = "OT 01",
"OT B" = "OT 02",
"OT C" = "OT 03",
"OT D" = "OT 04",
"Doublet" = "Doublet",
"Negative" = "Negative")
hashtag_donor_list_b1 <- list("OT 01" = "OT A",
"OT 02" = "OT B",
"OT 03" = "OT C",
"OT 04" = "OT D",
"Doublet" = "Doublet",
"Negative" = "Negative")
hashtag_list_b2 <- c("OT 05",
"OT 06",
"OT 07",
"OT 08")
donor_hashtag_list_b2 <- list("OT E" = "OT 05",
"OT F" = "OT 06",
"OT G" = "OT 07",
"OT H" = "OT 08",
"Doublet" = "Doublet",
"Negative" = "Negative")
hashtag_donor_list_b2 <- list("OT 05" = "OT E",
"OT 06" = "OT F",
"OT 07" = "OT G",
"OT 08" = "OT H",
"Doublet" = "Doublet",
"Negative" = "Negative")
DefaultAssay(seu_b1) <- "HTO"
seu_b1 <- NormalizeData(seu_b1, assay = "HTO", normalization.method = "CLR")
Normalizing across features
seu_b1 <- ScaleData(seu_b1, features = rownames(seu_b1),
verbose = FALSE)
seu_b1 <- RunPCA(seu_b1, features = rownames(seu_b1), approx = FALSE, verbose = FALSE)
seu_b1 <- RunTSNE(seu_b1, dims = 1:3, perplexity = 100, check_duplicates = FALSE, verbose = FALSE)
DefaultAssay(seu_b2) <- "HTO"
seu_b2 <- NormalizeData(seu_b2, assay = "HTO", normalization.method = "CLR")
Normalizing across features
seu_b2 <- ScaleData(seu_b2, features = rownames(seu_b2),
verbose = FALSE)
seu_b2 <- RunPCA(seu_b2, features = rownames(seu_b2), approx = FALSE, verbose = FALSE)
seu_b2 <- RunTSNE(seu_b2, dims = 1:3, perplexity = 100, check_duplicates = FALSE, verbose = FALSE)
Density plots per barcode. In ideal conditions the density of the hashtag counts should appear bimodal, with a lower peak corresponding to the background and the higher peak corresponding to the signal.
df <- as.data.frame(t(seu_b1[["HTO"]]@counts))
df %>%
pivot_longer(cols = starts_with("OT")) %>%
mutate(logged = log(value + 1)) %>%
ggplot(aes(x = logged)) +
xlab("log(counts)") +
xlim(3,10) +
geom_density(adjust = 2) +
facet_wrap(~name, scales = "fixed", ncol = 2) -> p1
df <- as.data.frame(t(seu_b2[["HTO"]]@counts))
df %>%
pivot_longer(cols = starts_with("OT")) %>%
mutate(logged = log(value + 1)) %>%
ggplot(aes(x = logged)) +
xlab("log(counts)") +
xlim(3,10) +
geom_density(adjust = 2) +
facet_wrap(~name, scales = "fixed", ncol = 2) -> p2
p1 / p2
Warning: Removed 188 rows containing non-finite values (`stat_density()`).
Warning: Removed 14885 rows containing non-finite values (`stat_density()`).
These data aren’t looking so hot. Hard to see a second peak in most hashtags…
p3 <- DimPlot(seu_b1, group.by = "genetic_donor") +
ggtitle("Batch 1")
p4 <- DimPlot(seu_b2, group.by = "genetic_donor") +
ggtitle("Batch 2")
p3 / p4
(((p1 / p2 ) | (p3 / p4)) + plot_annotation(tag_levels = 'a')) &
theme(plot.title = element_text(face = "plain", size = 10),
plot.tag = element_text(face = 'plain'))
Warning: Removed 188 rows containing non-finite values (`stat_density()`).
Warning: Removed 14885 rows containing non-finite values (`stat_density()`).
This function creates a list of hashedDrops calls. Its defaults are the same as hashedDrops
create_hashedDrops_factor <- function(seurat_object, confident.min = 2,
doublet.nmads = 3, doublet.min = 2) {
hto_counts <- GetAssayData(seurat_object[["HTO"]], slot = "counts")
hash_stats <- DropletUtils::hashedDrops(hto_counts, confident.min = confident.min,
doublet.nmads = doublet.nmads, doublet.min = doublet.min)
hash_stats$Best <- rownames(seurat_object[["HTO"]])[hash_stats$Best]
hash_stats$Second <- rownames(seurat_object[["HTO"]])[hash_stats$Second]
HTO_assignments <- factor(case_when(
hash_stats$Confident == TRUE ~ hash_stats$Best,
hash_stats$Doublet == TRUE ~ "Doublet",
TRUE ~ "Negative"))
return(HTO_assignments)
}
Making factors with best parameters
seu_b1$hashedDrops_calls <- create_hashedDrops_factor(seu_b1, confident.min = 0.5)
seu_b2$hashedDrops_calls <- create_hashedDrops_factor(seu_b2, confident.min = 0.5)
Now with default parameters
seu_b1$hashedDrops_default_calls <- create_hashedDrops_factor(seu_b1)
seu_b2$hashedDrops_default_calls <- create_hashedDrops_factor(seu_b2)
HashSolo is a scanpy program. Needs a bit of prep Write to anndata compatible files Counts
library(Matrix)
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
The following object is masked from 'package:S4Vectors':
expand
writeMM(seu_b1@assays$HTO@counts, here("data", "solid_tumor_data", "adata", "b1_counts.mtx"))
NULL
writeMM(seu_b2@assays$HTO@counts, here("data", "solid_tumor_data", "adata", "b2_counts.mtx"))
NULL
Barcodes
barcodes <- data.frame(colnames(seu_b1))
colnames(barcodes)<-'Barcode'
write.csv(barcodes, here("data", "solid_tumor_data", "adata", "b1_barcodes.csv"),
quote = FALSE,row.names = FALSE)
barcodes <- data.frame(colnames(seu_b2))
colnames(barcodes)<-'Barcode'
write.csv(barcodes, here("data", "solid_tumor_data", "adata", "b2_barcodes.csv"),
quote = FALSE,row.names = FALSE)
Save hashtag names
HTOs_b1 <- data.frame(rownames(seu_b1))
colnames(HTOs_b1) <- 'HTO'
write.csv(HTOs_b1, here("data", "solid_tumor_data", "adata", "HTOs_b1.csv"),
quote = FALSE,row.names = FALSE)
HTOs_b2 <- data.frame(rownames(seu_b2))
colnames(HTOs_b2) <- 'HTO'
write.csv(HTOs_b2, here("data", "solid_tumor_data", "adata", "HTOs_b2.csv"),
quote = FALSE,row.names = FALSE)
See hashsolo_calls.ipynb for how we get these assignments
seu_b1$hashsolo_calls <- read.csv(here("data", "solid_tumor_data", "adata", "b1_hashsolo.csv"))$Classification
seu_b2$hashsolo_calls <- read.csv(here("data", "solid_tumor_data", "adata", "b2_hashsolo.csv"))$Classification
seu_b1$deMULTIplex_calls <- MULTIseqDemux(seu_b1, autoThresh = TRUE)$MULTI_ID
No threshold found for OT 03...
No threshold found for OT 03...
No threshold found for OT 03...
No threshold found for OT 03...
No threshold found for OT 03...
No threshold found for OT 03...
No threshold found for OT 03...
No threshold found for OT 03...
No threshold found for OT 03...
No threshold found for OT 03...
No threshold found for OT 03...
No threshold found for OT 03...
No threshold found for OT 03...
No threshold found for OT 03...
No threshold found for OT 03...
No threshold found for OT 03...
No threshold found for OT 03...
Iteration 1
Using quantile 0.1
No threshold found for OT 03...
No threshold found for OT 04...
No threshold found for OT 04...
No threshold found for OT 04...
No threshold found for OT 04...
No threshold found for OT 04...
No threshold found for OT 04...
No threshold found for OT 04...
No threshold found for OT 04...
No threshold found for OT 04...
No threshold found for OT 04...
No threshold found for OT 04...
No threshold found for OT 04...
No threshold found for OT 04...
No threshold found for OT 04...
No threshold found for OT 04...
No threshold found for OT 04...
No threshold found for OT 04...
Iteration 2
Using quantile 0.1
No threshold found for OT 04...
Iteration 3
Using quantile 0.1
Iteration 4
Using quantile 0.1
Iteration 5
Using quantile 0.3
seu_b2$deMULTIplex_calls <- MULTIseqDemux(seu_b2, autoThresh = TRUE)$MULTI_ID
Iteration 1
Using quantile 0.2
Iteration 2
Using quantile 0.2
Iteration 3
Using quantile 0.2
HDmux <- HTODemux(seu_b1)
Cutoff for OT 01 : 422 reads
Cutoff for OT 02 : 910 reads
Cutoff for OT 03 : 188 reads
Cutoff for OT 04 : 352 reads
seu_b1$HTODemux_calls <- HDmux$hash.ID
HDmux <- HTODemux(seu_b2)
Cutoff for OT 05 : 35 reads
Cutoff for OT 06 : 31 reads
Cutoff for OT 07 : 876 reads
Cutoff for OT 08 : 166 reads
seu_b2$HTODemux_calls <- HDmux$hash.ID
GMM-Demux is run on the command line and needs a function to read in the results and format them all properly.
create_gmm_demux_factor <- function(seu, GMM_path, hto_list) {
#Read in output, have to use the "full" report, not the simplified one.
calls <- read.csv(paste0(GMM_path, "/GMM_full.csv"), row.names = 1)
#Read in names of clusters
cluster_names <- read.table(paste0(GMM_path, "/GMM_full.config"), sep = ",")
names(cluster_names) <- c("Cluster_id", "assignment")
#Need to fix the formatting of the assignment names, for some reason there's a leading space.
cluster_names$assignment <- gsub(x = cluster_names$assignment, pattern = '^ ', replacement = '')
#Add cell barcodes
calls$Barcode <- rownames(calls)
calls <- merge(calls, cluster_names, by = "Cluster_id", sort = FALSE)
#Need to re-order after merge for some reason
calls <- calls[order(match(calls$Barcode, names(seu$Barcode))), ]
#Rename the negative cluster for consistency
calls$assignment[calls$assignment == "negative"] <- "Negative"
#Put all the multiplet states into one assignment category
calls$assignment[!calls$assignment %in% c("Negative", hto_list)] <- "Doublet"
return(as.factor(calls$assignment))
}
Write the transpose of the counts matrix for GMM-Demux
write.csv(t(hashtag_counts_b1), here("data", "solid_tumor_data", "GMM-Demux", "hashtag_counts_b1_transpose.csv"))
write.csv(t(hashtag_counts_b2), here("data", "solid_tumor_data", "GMM-Demux", "hashtag_counts_b2_transpose.csv"))
Run the script run_GMM_demux_solid_tumor.sh in command line.
Add to objects
seu_b1$GMMDemux_calls <- create_gmm_demux_factor(seu_b1, here("data", "solid_tumor_data", "GMM-Demux", "gmm_out_b1", "full_report"), hashtag_list_b1)
seu_b2$GMMDemux_calls <- create_gmm_demux_factor(seu_b2, here("data", "solid_tumor_data", "GMM-Demux", "gmm_out_b2", "full_report"), hashtag_list_b2)
cellhashR_calls <- GenerateCellHashingCalls(barcodeMatrix = hashtag_counts_b1,
methods = c("bff_raw", "bff_cluster"),
doTSNE = FALSE, doHeatmap = FALSE)
[1] "Converting input data.frame to a matrix"
[1] "Starting BFF"
[1] "rows dropped for low counts: 0 of 4"
[1] "Running BFF_raw"
[1] "Only one peak found, using max value as cutoff: OT 01"
[1] "Only one peak found, using max value as cutoff: OT 02"
[1] "Only one peak found, using max value as cutoff: OT 03"
[1] "Only one peak found, using max value as cutoff: OT 04"
[1] "Thresholds:"
[1] "OT 04: 182082.316992221"
[1] "OT 03: 112127.839767934"
[1] "OT 02: 214644.500496396"
[1] "OT 01: 31165.7553663721"
[1] "Starting BFF"
[1] "rows dropped for low counts: 0 of 4"
[1] "Running BFF_cluster"
[1] "Doublet threshold: 0.05"
[1] "Neg threshold: 0.05"
[1] "Min distance as fraction of distance between peaks: 0.1"
[1] "Only one peak found, using max value as cutoff: OT 01"
[1] "Only one peak found, using max value as cutoff: OT 02"
[1] "Only one peak found, using max value as cutoff: OT 03"
[1] "Only one peak found, using max value as cutoff: OT 04"
[1] "Thresholds:"
[1] "OT 04: 182082.316992221"
[1] "OT 03: 112127.839767934"
[1] "OT 02: 214644.500496396"
[1] "OT 01: 31165.7553663721"
[1] "Only one peak found, using max value as cutoff: OT 01"
[1] "Only one peak found, using max value as cutoff: OT 02"
[1] "Only one peak found, using max value as cutoff: OT 03"
[1] "Only one peak found, using max value as cutoff: OT 04"
[1] "Error running BFF"
[1] "Generating consensus calls"
[1] "adding missing method: bff_cluster"
[1] "Consensus calls will be generated using: bff_raw,bff_cluster"
[1] "Total concordant: 12510"
[1] "Total discordant: 0 (0%)"
seu_b1$BFF_raw_calls <- cellhashR_calls$bff_raw
seu_b1$BFF_cluster_calls <- cellhashR_calls$bff_cluster
cellhashR_calls <- GenerateCellHashingCalls(barcodeMatrix = hashtag_counts_b2,
methods = c("bff_raw", "bff_cluster"),
doTSNE = FALSE, doHeatmap = FALSE)
[1] "Converting input data.frame to a matrix"
[1] "Starting BFF"
[1] "rows dropped for low counts: 0 of 4"
[1] "Running BFF_raw"
[1] "Only one peak found, using max value as cutoff: OT 07"
[1] "Thresholds:"
[1] "OT 08: 215.227799018146"
[1] "OT 07: 86433.9145009996"
[1] "OT 06: 54.6952290109095"
[1] "OT 05: 58.5737138510189"
[1] "Starting BFF"
[1] "rows dropped for low counts: 0 of 4"
[1] "Running BFF_cluster"
[1] "Doublet threshold: 0.05"
[1] "Neg threshold: 0.05"
[1] "Min distance as fraction of distance between peaks: 0.1"
[1] "Only one peak found, using max value as cutoff: OT 07"
[1] "Thresholds:"
[1] "OT 08: 215.227799018146"
[1] "OT 07: 86433.9145009996"
[1] "OT 06: 54.6952290109095"
[1] "OT 05: 58.5737138510189"
[1] "Only one peak found, using max value as cutoff: OT 07"
[1] "Smoothing parameter j = 5"
[1] "Smoothing parameter j = 10"
[1] "Smoothing parameter j = 15"
[1] "Smoothing parameter j = 20"
[1] "Generating consensus calls"
[1] "Consensus calls will be generated using: bff_raw,bff_cluster"
[1] "Total concordant: 7352"
[1] "Total discordant: 2195 (22.99%)"
seu_b2$BFF_raw_calls <- cellhashR_calls$bff_raw
seu_b2$BFF_cluster_calls <- cellhashR_calls$bff_cluster
seu_b1$BFF_cluster_calls <- case_when(seu_b1$BFF_cluster_calls == "Not Called" ~ "Negative")
This function turns the output of demuxmix into something consistent with the other methods
demuxmix_calls_consistent <- function(seurat_object, model = "naive", hto_list) {
hto_counts <- as.matrix(GetAssayData(seurat_object[["HTO"]], slot = "counts"))
dmm <- demuxmix(hto_counts, model = model)
dmm_calls <- dmmClassify(dmm)
calls_out <- case_when(dmm_calls$HTO %in% hto_list ~ dmm_calls$HTO,
!dmm_calls$HTO %in% hto_list ~ case_when(
dmm_calls$Type == "multiplet" ~ "Doublet",
dmm_calls$Type %in% c("negative", "uncertain") ~ "Negative")
)
return(as.factor(calls_out))
}
seu_b1$demuxmix_calls <- demuxmix_calls_consistent(seu_b1, hto_list = hashtag_list_b1)
seu_b2$demuxmix_calls <- demuxmix_calls_consistent(seu_b2, hto_list = hashtag_list_b2)
Save Seurat objects with all the hashtag assignments
#saveRDS(seu_b1, here("data", "solid_tumor_data", "batch1_all_methods.SEU.rds"))
#saveRDS(seu_b2, here("data", "solid_tumor_data", "batch2_all_methods.SEU.rds"))
Looking at what fraction of droplets are assigned as singlets, doublets or negative by each method
method_calls <- c("hashedDrops_calls",
"hashedDrops_default_calls",
"hashsolo_calls",
"HTODemux_calls",
"GMMDemux_calls",
"deMULTIplex_calls",
"BFF_raw_calls",
"BFF_cluster_calls",
"demuxmix_calls")
for (method in method_calls){
seu_b1[[gsub(method, paste0(method, "_donors"), method)]] <- as.factor(unlist(hashtag_list_b1[unlist(seu_b1[[method]])]))
seu_b2[[gsub(method, paste0(method, "_donors"), method)]] <- as.factor(unlist(hashtag_list_b2[unlist(seu_b2[[method]])]))
}
seu_b1$Batch <- "Batch 1"
seu_b2$Batch <- "Batch 2"
seu <- merge(seu_b1, seu_b2)
Warning in CheckDuplicateCellNames(object.list = objects): Some cell names are
duplicated across objects provided. Renaming to enforce unique cell names.
seu$Batch <- as.factor(seu$Batch)
Need to make a factor which reduces the assignments to one of (“Singlet”, “Doublet”, “Negative”).
sd_or_u_hashtags <- function(seurat_object, method) {
return(case_when(seurat_object[[method]] == "Doublet" ~ "Doublet",
seurat_object[[method]] == "Negative" ~ "Negative",
TRUE ~ "Singlet"))
}
for (method in method_calls) {
seu[[gsub(method, paste0(method, "_category"), method)]] <- sd_or_u_hashtags(seu, method)
}
seu$genetic_donor_category <- case_when(seu$genetic_donor == "Doublet" ~ "Doublet",
seu$genetic_donor == "Negative" ~ "Negative",
TRUE ~ "Singlet")
p1 <- dittoBarPlot(seu, var = "genetic_donor_category", group.by = "Batch") + NoLegend() +
ggtitle("vireo (genetics)") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_blank())
p2 <- dittoBarPlot(seu, var = "BFF_cluster_calls_category", group.by = "Batch") + NoLegend() +
ggtitle("BFF_cluster") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_blank(), axis.text.y = element_blank())
p3 <- dittoBarPlot(seu, var = "BFF_raw_calls_category", group.by = "Batch") + NoLegend() +
ggtitle("BFF_raw") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_blank())
p4 <- dittoBarPlot(seu, var = "deMULTIplex_calls_category", group.by = "Batch") +
ggtitle("deMULTIplex") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_blank(), axis.text.y = element_blank())
p5 <- dittoBarPlot(seu, var = "demuxmix_calls_category", group.by = "Batch") + NoLegend() +
ggtitle("demuxmix") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_blank())
p6 <- dittoBarPlot(seu, var = "GMMDemux_calls_category", group.by = "Batch") + NoLegend() +
ggtitle("GMM-Demux") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_blank(), axis.text.y = element_blank())
p7 <- dittoBarPlot(seu, var = "hashedDrops_calls_category", group.by = "Batch") + NoLegend() +
ggtitle("hashedDrops - best") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_blank())
p8 <- dittoBarPlot(seu, var = "hashedDrops_default_calls_category", group.by = "Batch") + NoLegend() +
ggtitle("hashedDrops - default") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_blank(), axis.text.y = element_blank())
p9 <- dittoBarPlot(seu, var = "hashsolo_calls_category", group.by = "Batch") + NoLegend() +
ggtitle("HashSolo") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_text(size = 14))
p10 <- dittoBarPlot(seu, var = "HTODemux_calls_category", group.by = "Batch") + NoLegend() +
ggtitle("HTODemux") +
theme(axis.title.x = element_blank(), axis.title.y = element_blank(),
axis.text.x = element_text(size = 14), axis.text.y = element_blank())
(p1 + p2) / (p3 + p4) / (p5 + p6) / (p7 + p8) / (p9 + p10)
calculate_HTO_fscore <- function(seurat_object, donor_hto_list, method) {
calls <- seurat_object[[method]]
f <- NULL
for (HTO in donor_hto_list) {
tp <- sum(calls == HTO & donor_hto_list[seurat_object$genetic_donor] == HTO) #True positive rate
fp <- sum(calls == HTO & donor_hto_list[seurat_object$genetic_donor] != HTO) #False positive rate
fn <- sum(calls != HTO & donor_hto_list[seurat_object$genetic_donor] == HTO) #False negative rate
f <- c(f, tp / (tp + 0.5 * (fp + fn)))
}
# f <- c(f, median(f)) #Add median F score
f <- c(f, mean(f)) #Add mean F score
names(f) <- c(donor_hto_list, "Average")
return(f)
}
Fscore_hashedDrops_b1 <- calculate_HTO_fscore(seu_b1, donor_hashtag_list_b1[1:4], "hashedDrops_calls")
Fscore_hashedDrops_b2 <- calculate_HTO_fscore(seu_b2, donor_hashtag_list_b2[1:4], "hashedDrops_calls")
Fscore_hashedDrops_default_b1 <- calculate_HTO_fscore(seu_b1, donor_hashtag_list_b1[1:4], "hashedDrops_default_calls")
Fscore_hashedDrops_default_b2 <- calculate_HTO_fscore(seu_b2, donor_hashtag_list_b2[1:4], "hashedDrops_default_calls")
Fscore_hashsolo_b1 <- calculate_HTO_fscore(seu_b1, donor_hashtag_list_b1[1:4], "hashsolo_calls")
Fscore_hashsolo_b2 <- calculate_HTO_fscore(seu_b2, donor_hashtag_list_b2[1:4], "hashsolo_calls")
Fscore_HTODemux_b1 <- calculate_HTO_fscore(seu_b1, donor_hashtag_list_b1[1:4], "HTODemux_calls")
Fscore_HTODemux_b2 <- calculate_HTO_fscore(seu_b2, donor_hashtag_list_b2[1:4], "HTODemux_calls")
Fscore_GMMDemux_b1 <- calculate_HTO_fscore(seu_b1, donor_hashtag_list_b1[1:4], "GMMDemux_calls")
Fscore_GMMDemux_b2 <- calculate_HTO_fscore(seu_b2, donor_hashtag_list_b2[1:4], "GMMDemux_calls")
Fscore_deMULTIplex_b1 <- calculate_HTO_fscore(seu_b1, donor_hashtag_list_b1[1:4], "deMULTIplex_calls")
Fscore_deMULTIplex_b2 <- calculate_HTO_fscore(seu_b2, donor_hashtag_list_b2[1:4], "deMULTIplex_calls")
Fscore_BFF_raw_b1 <- calculate_HTO_fscore(seu_b1, donor_hashtag_list_b1[1:4], "BFF_raw_calls")
Fscore_BFF_raw_b2 <- calculate_HTO_fscore(seu_b2, donor_hashtag_list_b2[1:4], "BFF_raw_calls")
Fscore_BFF_cluster_b1 <- calculate_HTO_fscore(seu_b1, donor_hashtag_list_b1[1:4], "BFF_cluster_calls")
Fscore_BFF_cluster_b2 <- calculate_HTO_fscore(seu_b2, donor_hashtag_list_b2[1:4], "BFF_cluster_calls")
Fscore_demuxmix_b1 <- calculate_HTO_fscore(seu_b1, donor_hashtag_list_b1[1:4], "demuxmix_calls")
Fscore_demuxmix_b2 <- calculate_HTO_fscore(seu_b2, donor_hashtag_list_b2[1:4], "demuxmix_calls")
Fscore_matrix_b1 <- data.frame("Hashtag" = c(hashtag_list_b1[1:4], "Mean"),
"hashedDrops" = Fscore_hashedDrops_b1,
"hashedDrops_default" = Fscore_hashedDrops_default_b1,
"HashSolo" = Fscore_hashsolo_b1,
"HTODemux" = Fscore_HTODemux_b1,
"GMM_Demux" = Fscore_GMMDemux_b1,
"deMULTIplex" = Fscore_deMULTIplex_b1,
"BFF_raw" = Fscore_BFF_raw_b1,
"BFF_cluster" = Fscore_BFF_cluster_b1,
"demuxmix" = Fscore_demuxmix_b1)
Fscore_matrix_b1
Hashtag hashedDrops hashedDrops_default HashSolo HTODemux GMM_Demux
OT 01 OT 01 0.3878205 0.3508772 0.3773585 0.3414634 0.3529412
OT 02 OT 02 0.8148297 0.6387852 0.7554054 0.6926214 0.7661009
OT 03 OT 03 0.7790672 0.3617088 0.6282509 0.7844753 0.7506263
OT 04 OT 04 0.7350051 0.2707768 0.6620463 0.6610695 0.7087034
Average Mean 0.6791806 0.4055370 0.6057653 0.6199074 0.6445929
deMULTIplex BFF_raw BFF_cluster demuxmix
OT 01 0.35600000 0 0 0.3066955
OT 02 0.32425422 0 0 0.6209396
OT 03 0.08955224 0 0 0.5673913
OT 04 0.10065189 0 0 0.5490760
Average 0.21761459 0 0 0.5110256
#Removing average information
Fscore_matrix_b1 = Fscore_matrix_b1[1:4,]
Fscore_matrix_b2 <- data.frame("Hashtag" = c(hashtag_list_b2[1:4], "Mean"),
"hashedDrops" = Fscore_hashedDrops_b2,
"hashedDrops_default" = Fscore_hashedDrops_default_b2,
"HashSolo" = Fscore_hashsolo_b2,
"HTODemux" = Fscore_HTODemux_b2,
"GMM_Demux" = Fscore_GMMDemux_b2,
"deMULTIplex" = Fscore_deMULTIplex_b2,
"BFF_raw" = Fscore_BFF_raw_b2,
"BFF_cluster" = Fscore_BFF_cluster_b2,
"demuxmix" = Fscore_demuxmix_b2)
Fscore_matrix_b2
Hashtag hashedDrops hashedDrops_default HashSolo HTODemux GMM_Demux
OT 05 OT 05 0.7832734 0.3160055 0.4639805 0.8184874 0.7510431
OT 06 OT 06 0.8346273 0.4904387 0.6341223 0.8396890 0.7891936
OT 07 OT 07 0.8075230 0.3776145 0.8724261 0.1335944 0.2358240
OT 08 OT 08 0.8257874 0.6737542 0.7803532 0.8230288 0.7537453
Average Mean 0.8128028 0.4644532 0.6877205 0.6536999 0.6324515
deMULTIplex BFF_raw BFF_cluster demuxmix
OT 05 0.7888938 0.8201499 0.7037284 0.4491315
OT 06 0.8063492 0.8550619 0.7051852 0.4615385
OT 07 0.3422171 0.0000000 0.2221590 0.4526467
OT 08 0.7998987 0.8186554 0.0000000 0.6608187
Average 0.6843397 0.6234668 0.4077681 0.5060339
#Removing average information
Fscore_matrix_b2 = Fscore_matrix_b2[1:4,]
Fscore_matrix_b1 %>%
pivot_longer(cols = c("hashedDrops", "hashedDrops_default", "HashSolo",
"HTODemux", "GMM_Demux", "deMULTIplex",
"BFF_raw", "BFF_cluster", "demuxmix"),
names_to = "method",
values_to = "Fscore") -> Fscore_matrix_b1
Fscore_matrix_b2 %>%
pivot_longer(cols = c("hashedDrops", "hashedDrops_default", "HashSolo",
"HTODemux", "GMM_Demux", "deMULTIplex",
"BFF_raw", "BFF_cluster", "demuxmix"),
names_to = "method",
values_to = "Fscore") -> Fscore_matrix_b2
Fscore_matrix_b1$Batch <- "Batch 1"
Fscore_matrix_b2$Batch <- "Batch 2"
Fscore_matrix <- bind_rows(Fscore_matrix_b1,
Fscore_matrix_b2)
Fscore_matrix %>%
group_by(Batch) %>%
heatmap(.row = method,
.column = Hashtag,
.value = Fscore,
column_title = "F-score - solid tissue data",
cluster_rows = TRUE,
row_names_gp = gpar(fontsize = 10),
show_row_dend = FALSE,
row_names_side = "left",
row_title = "",
cluster_columns = FALSE,
column_names_gp = gpar(fontsize = 10),
palette_value = plasma(3)) -> p1
tidyHeatmap says: (once per session) from release 1.7.0 the scaling is set to "none" by default. Please use scale = "row", "column" or "both" to apply scaling
p1 <- wrap_heatmap(p1)
p1
# ggsave(here("paper_latex", "figures", "OT_Fscore.png"),
# p1,
# device = "png",
# width = 8, height = 5,
# units = "in",
# dpi = 350
# )
Also worried about whether any of these methods are assigning genetic doublets as singlets. Going to look at the fraction of genetic doublets that get assigned to each of the possible HTO categories.
b1_doublets <- seu_b1[, seu_b1$genetic_donor == "Doublet"]
b2_doublets <- seu_b2[, seu_b2$genetic_donor == "Doublet"]
b1_doublet_doublet <- NULL
b1_doublet_negative <- NULL
b1_doublet_singlet <- NULL
b2_doublet_doublet <- NULL
b2_doublet_negative <- NULL
b2_doublet_singlet <- NULL
for (method in method_calls) {
b1_doublet_doublet <- c(b1_doublet_doublet, sum(seu_b1$genetic_donor == "Doublet" & seu_b1[[method]] == "Doublet") / sum(seu_b1$genetic_donor == "Doublet"))
b1_doublet_negative <- c(b1_doublet_negative, sum(seu_b1$genetic_donor == "Doublet" & seu_b1[[method]] == "Negative") / sum(seu_b1$genetic_donor == "Doublet"))
b1_doublet_singlet <- c(b1_doublet_singlet, sum(seu_b1$genetic_donor == "Doublet" & Reduce("|", lapply(hashtag_list_b1[1:4], function(x) seu_b1[[method]] == x))) / sum(seu_b1$genetic_donor == "Doublet"))
b2_doublet_doublet <- c(b2_doublet_doublet, sum(seu_b2$genetic_donor == "Doublet" & seu_b2[[method]] == "Doublet") / sum(seu_b2$genetic_donor == "Doublet"))
b2_doublet_negative <- c(b2_doublet_negative, sum(seu_b2$genetic_donor == "Doublet" & seu_b2[[method]] == "Negative") / sum(seu_b2$genetic_donor == "Doublet"))
b2_doublet_singlet <- c(b2_doublet_singlet, sum(seu_b2$genetic_donor == "Doublet" & Reduce("|", lapply(hashtag_list_b2[1:4], function(x) seu_b2[[method]] == x))) / sum(seu_b2$genetic_donor == "Doublet"))
}
names(b1_doublet_doublet) <- method_calls
names(b1_doublet_negative) <- method_calls
names(b1_doublet_singlet) <- method_calls
names(b2_doublet_doublet) <- method_calls
names(b2_doublet_negative) <- method_calls
names(b2_doublet_singlet) <- method_calls
b1_doublet_assignments <- data.frame("method" = c("hashedDrops",
"hashedDrops (default)",
"HashSolo",
"HTODemux",
"GMM-Demux",
"deMULTIplex",
"BFF_raw",
"BFF_cluster",
"demuxmix"),
"Doublet" = b1_doublet_doublet,
"Negative" = b1_doublet_negative,
"Singlet" = b1_doublet_singlet) %>%
pivot_longer(cols = c("Doublet", "Negative", "Singlet"),
names_to = "assignment",
values_to = "fraction")
b2_doublet_assignments <- data.frame("method" = c("hashedDrops",
"hashedDrops (default)",
"HashSolo",
"HTODemux",
"GMM-Demux",
"deMULTIplex",
"BFF_raw",
"BFF_cluster",
"demuxmix"),
"Doublet" = b2_doublet_doublet,
"Negative" = b2_doublet_negative,
"Singlet" = b2_doublet_singlet) %>%
pivot_longer(cols = c("Doublet", "Negative", "Singlet"),
names_to = "assignment",
values_to = "fraction")
doublet_colours <- c("black", "gray60", "firebrick1")
p2 <- ggplot(b1_doublet_assignments %>%
mutate(method = factor(method, levels = c("BFF_cluster",
"BFF_raw",
"deMULTIplex",
"hashedDrops (default)",
"demuxmix",
"HTODemux",
"GMM-Demux",
"HashSolo",
"hashedDrops")))) +
geom_bar(aes(x = method, y = fraction, fill = assignment),
stat = "identity") +
ggtitle("Batch 1 (1048 doublets)") +
ylim(0, 1) +
scale_fill_manual(values = doublet_colours) +
theme(axis.ticks.x = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_text(size = 12)) +
coord_flip() + NoLegend()
p3 <- ggplot(b2_doublet_assignments %>%
mutate(method = factor(method, levels = c("BFF_cluster",
"BFF_raw",
"deMULTIplex",
"hashedDrops (default)",
"demuxmix",
"HTODemux",
"GMM-Demux",
"HashSolo",
"hashedDrops")))) +
geom_bar(aes(x = method, y = fraction, fill = assignment),
stat = "identity") +
ggtitle("Batch 2 (805 doublets)") +
ylim(0, 1) +
scale_fill_manual(values = doublet_colours) +
theme(axis.ticks.x = element_blank(),
axis.ticks.y = element_blank(),
axis.title.y = element_blank(),
axis.text.y = element_blank()
) + coord_flip()
p2 | p3
# ggsave(here("paper_latex", "figures", "OT_doublet_assignments.png"),
# p2 | p3,
# device = "png",
# width = 8, height = 5,
# units = "in",
# dpi = 350
# )
sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.0.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] Matrix_1.5-4 viridis_0.6.3
[3] viridisLite_0.4.2 tidyHeatmap_1.8.1
[5] ComplexHeatmap_2.14.0 demuxmix_1.0.0
[7] RColorBrewer_1.1-3 cellhashR_1.0.3
[9] dittoSeq_1.10.0 speckle_0.99.7
[11] pheatmap_1.0.12 SeuratObject_4.1.3
[13] Seurat_4.3.0 scater_1.26.1
[15] scuttle_1.8.4 lubridate_1.9.2
[17] forcats_1.0.0 stringr_1.5.0
[19] purrr_1.0.1 readr_2.1.4
[21] tidyr_1.3.0 tibble_3.2.1
[23] tidyverse_2.0.0 DropletUtils_1.18.1
[25] SingleCellExperiment_1.20.1 SummarizedExperiment_1.28.0
[27] Biobase_2.58.0 GenomicRanges_1.50.2
[29] GenomeInfoDb_1.34.9 IRanges_2.32.0
[31] S4Vectors_0.36.2 BiocGenerics_0.44.0
[33] MatrixGenerics_1.10.0 matrixStats_0.63.0
[35] patchwork_1.1.2 cowplot_1.1.1
[37] ggplot2_3.4.2 janitor_2.2.0
[39] dplyr_1.1.2 BiocStyle_2.26.0
[41] here_1.0.1 workflowr_1.7.0
loaded via a namespace (and not attached):
[1] scattermore_1.0 R.methodsS3_1.8.2
[3] knitr_1.42 irlba_2.3.5.1
[5] DelayedArray_0.24.0 R.utils_2.12.2
[7] data.table_1.14.8 RCurl_1.98-1.12
[9] doParallel_1.0.17 generics_0.1.3
[11] preprocessCore_1.61.0 ScaledMatrix_1.6.0
[13] callr_3.7.3 RANN_2.6.1
[15] future_1.32.0 tzdb_0.3.0
[17] spatstat.data_3.0-1 httpuv_1.6.11
[19] xfun_0.39 hms_1.1.3
[21] jquerylib_0.1.4 evaluate_0.21
[23] promises_1.2.0.1 fansi_1.0.4
[25] dendextend_1.17.1 igraph_1.4.2
[27] DBI_1.1.3 htmlwidgets_1.6.2
[29] spatstat.geom_3.2-1 ellipsis_0.3.2
[31] bookdown_0.34 deldir_1.0-6
[33] sparseMatrixStats_1.10.0 vctrs_0.6.2
[35] Cairo_1.6-0 ROCR_1.0-11
[37] abind_1.4-5 cachem_1.0.8
[39] withr_2.5.0 ggforce_0.4.1
[41] progressr_0.13.0 sctransform_0.3.5
[43] goftest_1.2-3 cluster_2.1.4
[45] lazyeval_0.2.2 crayon_1.5.2
[47] spatstat.explore_3.2-1 edgeR_3.40.2
[49] pkgconfig_2.0.3 labeling_0.4.2
[51] tweenr_2.0.2 nlme_3.1-162
[53] vipor_0.4.5 rlang_1.1.1
[55] globals_0.16.2 lifecycle_1.0.3
[57] miniUI_0.1.1.1 rsvd_1.0.5
[59] ggrastr_1.0.1 rprojroot_2.0.3
[61] polyclip_1.10-4 lmtest_0.9-40
[63] Rhdf5lib_1.20.0 zoo_1.8-12
[65] beeswarm_0.4.0 whisker_0.4.1
[67] ggridges_0.5.4 GlobalOptions_0.1.2
[69] processx_3.8.1 png_0.1-8
[71] rjson_0.2.21 bitops_1.0-7
[73] getPass_0.2-2 R.oo_1.25.0
[75] KernSmooth_2.23-21 rhdf5filters_1.10.1
[77] ggExtra_0.10.0 DelayedMatrixStats_1.20.0
[79] shape_1.4.6 parallelly_1.35.0
[81] spatstat.random_3.1-5 beachmat_2.14.2
[83] scales_1.2.1 magrittr_2.0.3
[85] plyr_1.8.8 ica_1.0-3
[87] zlibbioc_1.44.0 compiler_4.2.2
[89] dqrng_0.3.0 clue_0.3-64
[91] fitdistrplus_1.1-11 snakecase_0.11.0
[93] cli_3.6.1 XVector_0.38.0
[95] listenv_0.9.0 pbapply_1.7-0
[97] ps_1.7.5 MASS_7.3-60
[99] tidyselect_1.2.0 stringi_1.7.12
[101] highr_0.10 yaml_2.3.7
[103] BiocSingular_1.14.0 locfit_1.5-9.7
[105] ggrepel_0.9.3 sass_0.4.6
[107] tools_4.2.2 timechange_0.2.0
[109] future.apply_1.10.0 parallel_4.2.2
[111] circlize_0.4.15 rstudioapi_0.14
[113] foreach_1.5.2 git2r_0.32.0
[115] gridExtra_2.3 rmdformats_1.0.4
[117] farver_2.1.1 Rtsne_0.16
[119] digest_0.6.31 BiocManager_1.30.20
[121] shiny_1.7.4 Rcpp_1.0.10
[123] egg_0.4.5 later_1.3.1
[125] RcppAnnoy_0.0.20 httr_1.4.6
[127] naturalsort_0.1.3 colorspace_2.1-0
[129] fs_1.6.2 tensor_1.5
[131] reticulate_1.28 splines_4.2.2
[133] uwot_0.1.14 spatstat.utils_3.0-3
[135] sp_1.6-0 plotly_4.10.1
[137] xtable_1.8-4 jsonlite_1.8.4
[139] R6_2.5.1 pillar_1.9.0
[141] htmltools_0.5.5 mime_0.12
[143] glue_1.6.2 fastmap_1.1.1
[145] BiocParallel_1.32.6 BiocNeighbors_1.16.0
[147] codetools_0.2-19 utf8_1.2.3
[149] lattice_0.21-8 bslib_0.4.2
[151] spatstat.sparse_3.0-1 ggbeeswarm_0.7.2
[153] leiden_0.4.3 magick_2.7.4
[155] survival_3.5-5 limma_3.54.2
[157] rmarkdown_2.21 munsell_0.5.0
[159] GetoptLong_1.0.5 rhdf5_2.42.1
[161] GenomeInfoDbData_1.2.9 iterators_1.0.14
[163] HDF5Array_1.26.0 reshape2_1.4.4
[165] gtable_0.3.3