Last updated: 2021-10-07
Checks: 7 0
Knit directory: MINTIE-paper-analysis/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20200415)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 4606aca. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Ignored: analysis/cache/
Ignored: data/.DS_Store
Ignored: data/RCH_B-ALL/
Ignored: data/leucegene/.DS_Store
Ignored: data/leucegene/salmon_out/
Ignored: data/leucegene/sample_info/.DS_Store
Ignored: data/leucegene/sample_info/KMT2A-PTD_8-2.fa.xls
Ignored: output/Leucegene_gene_counts.tsv
Ignored: packrat/lib-R/
Ignored: packrat/lib-ext/
Ignored: packrat/lib/
Untracked files:
Untracked: data/leucegene/validation_results/TAP/
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/Leucegene_Validation.Rmd
) and HTML (docs/Leucegene_Validation.html
) files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 4606aca | mcmero | 2021-10-07 | Increase fig 4 text size |
html | 0355baf | Marek Cmero | 2021-05-31 | Build site. |
Rmd | 8dc1f70 | Marek Cmero | 2021-05-31 | Fixed axis labels |
html | c116fce | Marek Cmero | 2021-05-28 | Build site. |
Rmd | ed3d2b6 | Marek Cmero | 2021-05-28 | Figures tweaks and reordering. |
html | c60c3b4 | Marek Cmero | 2021-05-18 | Build site. |
Rmd | c46b2c5 | Marek Cmero | 2021-05-18 | Added summary figure of variants found in Leucegene cohort |
html | 4206f12 | Marek Cmero | 2021-04-30 | Build site. |
Rmd | dde9f5b | Marek Cmero | 2021-04-30 | wflow_publish(files = list.files(pattern = "*Rmd")) |
Rmd | 9595530 | Marek Cmero | 2021-04-30 | Updated analyses |
html | 4b8113e | Marek Cmero | 2020-07-03 | Build site. |
Rmd | 42ce21b | Marek Cmero | 2020-07-03 | Added CICERO results |
html | 42ce21b | Marek Cmero | 2020-07-03 | Added CICERO results |
html | 379b944 | Marek Cmero | 2020-06-26 | Build site. |
Rmd | 5448658 | Marek Cmero | 2020-06-26 | Update with missing leucegene sample |
html | e9e4917 | Marek Cmero | 2020-06-24 | Build site. |
Rmd | 9434bfe | Marek Cmero | 2020-06-24 | Updated results with latest MINTIE run. Fixed bug with KMT2A PTD checking in different controls. Added leucegene |
html | 0b21347 | Marek Cmero | 2020-06-11 | Build site. |
Rmd | fa6bf0c | Marek Cmero | 2020-06-11 | Updated with new results; improved tables |
html | fa6bf0c | Marek Cmero | 2020-06-11 | Updated with new results; improved tables |
html | 3702862 | Marek Cmero | 2020-05-18 | Removed MLM samples from final B-ALL results |
html | a166ab8 | Marek Cmero | 2020-05-08 | Build site. |
html | a600688 | Marek Cmero | 2020-05-07 | Build site. |
Rmd | 0fde0b8 | Marek Cmero | 2020-05-07 | Added RCH B-ALL analysis |
html | 1c40e33 | Marek Cmero | 2020-05-07 | Build site. |
Rmd | bbc278a | Marek Cmero | 2020-05-07 | Refactoring |
html | 87b4e62 | Marek Cmero | 2020-05-07 | Build site. |
Rmd | af503f2 | Marek Cmero | 2020-05-07 | Refactoring |
html | 5c045b5 | Marek Cmero | 2020-05-07 | Build site. |
Rmd | d8d5b96 | Marek Cmero | 2020-05-07 | Added Leucegene variant validation |
html | 90c7fd9 | Marek Cmero | 2020-05-06 | Build site. |
Rmd | 44d8c37 | Marek Cmero | 2020-05-06 | Build leucegene validation notebook. |
Rmd | ff4b1dc | Marek Cmero | 2020-05-06 | Leucegene results |
# util
library(data.table)
library(dplyr)
library(here)
library(stringr)
# plotting/tables
library(ggplot2)
library(gt)
# bioinformatics
library(GenomicRanges)
options(stringsAsFactors = FALSE)
source(here("code/leucegene_helper.R"))
source(here("code/simu_helper.R"))
Here we analyse the results of MINTIE run on a number of Leucegene samples, including the effect of controls on a cohort with KMT2A-PTD variants. We also check whether MINTIE has called known variants within the cohort.
# load SRX to patient ID lookup table
kmt2a_patient_lookup <- read.delim(here("data/leucegene/sample_info/KMT2A-PTD_samples.txt"),
header = FALSE,
col.names = c("sample", "patient"))
kmt2a_results_dir <- here("data/leucegene/KMT2A-PTD_results")
# load KMT2A cohort comparisons against all other controls
kmt2a_results <- load_controls_comparison(kmt2a_results_dir)
kmt2a_results <- inner_join(kmt2a_results, kmt2a_patient_lookup, by = "sample")
# load other validation reults and truth table
truth <- read.delim(here("data/leucegene/sample_info/variant_validation_table.tsv"), sep = "\t")
leucegene_results_dir <- here("data/leucegene/validation_results/MINTIE/")
validation <- list.files(leucegene_results_dir, full.names = TRUE) %>%
lapply(., read.delim) %>%
rbindlist(fill = TRUE) %>%
filter(logFC > 5)
MINTIE paper Supplementary Figure 6. Shows the number of variant genes found in the Leucegene cohort containing KMT2A PTDs.
results_summary <- get_results_summary(mutate(kmt2a_results,
sample = patient,
group_var = controls),
group_var_name = "controls")
# build table
results_summary %>%
group_by(controls) %>%
summarise(min = min(V1), median = median(V1), max = max(V1)) %>%
data.frame() %>%
gt() %>%
tab_header(
title = md("**Total variant genes called using different controls**")
) %>%
tab_options(
table.font.size = 12
) %>%
cols_label(
controls = md("**Controls**"),
min = md("**Min**"),
median = md("**Median**"),
max = md("**Max**")
)
Total variant genes called using different controls | |||
---|---|---|---|
Controls | Min | Median | Max |
AML_controls | 130 | 211.5 | 1852 |
normal_controls | 264 | 645.5 | 2265 |
normal_controls_reduced | 490 | 913.0 | 2552 |
no_controls | 4750 | 9255.5 | 11796 |
ggplot(results_summary, aes(sample, V1, fill = controls)) +
geom_bar(position = position_dodge2(width = 0.9, preserve = "single"), stat = "identity") +
theme_bw() +
xlab("") +
ylab("Genes with variants") +
coord_flip() +
theme(legend.position = "bottom") +
scale_fill_brewer(palette="Dark2",
labels = c("AML_controls" = "13 AMLs",
"normal_controls" = "13 normals",
"normal_controls_reduced" = "3 normals",
"no_controls" = "No controls"))
MINTIE paper Supplementary Table 1. Shows whether MINITE found a KMT2A SV in each sample for the given control group. Coverage obtained from Audemard et al. spreadsheet containing the Leucegene results must be manually added to data/leucegene/sample_info
to run the code.
# load results from km paper for coverage of KMT2A PTDs
kmt2a_lgene_km_results <- read.csv(here("data/leucegene/sample_info/KMT2A-PTD_8-2.fa.xls"), sep="\t") %>%
mutate(patient = Sample) %>%
group_by(patient) %>%
summarise(coverage = max(Min.coverage))
# check whether MINTIE found a KMT2A SV in each control set
found_using_cancon <- get_samples_with_kmt2a_sv(kmt2a_results, "AML_controls")
found_using_norcon <- get_samples_with_kmt2a_sv(kmt2a_results, "normal_controls")
found_using_redcon <- get_samples_with_kmt2a_sv(kmt2a_results, "normal_controls_reduced")
found_using_nocon <- get_samples_with_kmt2a_sv(kmt2a_results, "no_controls")
# make the table
kmt2a_control_comp <- inner_join(kmt2a_patient_lookup, kmt2a_lgene_km_results, by = "patient") %>%
arrange(desc(coverage))
kmt2a_control_comp$`13_AMLs` <- kmt2a_control_comp$sample %in% found_using_cancon
kmt2a_control_comp$`13_normals` <- kmt2a_control_comp$sample %in% found_using_norcon
kmt2a_control_comp$`3_normals` <- kmt2a_control_comp$sample %in% found_using_redcon
kmt2a_control_comp$`no_controls` <- kmt2a_control_comp$sample %in% found_using_nocon
# build output table
kmt2a_control_comp %>%
gt() %>%
cols_label(
sample = md("**Sample**"),
patient = md("**Patient**"),
coverage = md("**Coverage**"),
`13_AMLs` = md("**13 AMLs**"),
`13_normals` = md("**13 Normals**"),
`3_normals` = md("**3 Normals**"),
`no_controls` = md("**No Controls**")
) %>%
tab_header(
title = md("**KMT2A PTDs found in Leucegene cohort**")
) %>%
tab_options(
table.font.size = 12
) %>%
tab_style(
style = cell_fill(color = "lightgreen"),
locations = cells_body(
columns = vars(`13_AMLs`),
rows = `13_AMLs`)
) %>%
tab_style(
style = cell_fill(color = "lightgreen"),
locations = cells_body(
columns = vars(`13_normals`),
rows = `13_normals`)
) %>%
tab_style(
style = cell_fill(color = "lightgreen"),
locations = cells_body(
columns = vars(`3_normals`),
rows = `3_normals`)
) %>%
tab_style(
style = cell_fill(color = "lightgreen"),
locations = cells_body(
columns = vars(`no_controls`),
rows = `no_controls`)
)
KMT2A PTDs found in Leucegene cohort | ||||||
---|---|---|---|---|---|---|
Sample | Patient | Coverage | 13 AMLs | 13 Normals | 3 Normals | No Controls |
SRX958906 | 07H152 | 158 | FALSE | TRUE | TRUE | TRUE |
SRX332646 | 09H115 | 125 | FALSE | TRUE | TRUE | TRUE |
SRX957230 | 06H146 | 87 | TRUE | TRUE | TRUE | TRUE |
SRX957223 | 05H111 | 79 | TRUE | TRUE | TRUE | TRUE |
SRX332659 | 11H021 | 63 | FALSE | TRUE | TRUE | TRUE |
SRX332633 | 05H050 | 58 | TRUE | TRUE | TRUE | TRUE |
SRX959061 | 13H150 | 58 | FALSE | TRUE | FALSE | TRUE |
SRX959044 | 13H048 | 57 | TRUE | TRUE | TRUE | TRUE |
SRX958974 | 10H070 | 53 | TRUE | TRUE | TRUE | TRUE |
SRX958963 | 10H007 | 50 | TRUE | TRUE | TRUE | TRUE |
SRX958959 | 09H106 | 49 | TRUE | TRUE | TRUE | TRUE |
SRX959060 | 13H141 | 45 | TRUE | TRUE | TRUE | TRUE |
SRX958945 | 09H058 | 29 | TRUE | TRUE | TRUE | TRUE |
SRX958907 | 07H155 | 23 | FALSE | TRUE | TRUE | TRUE |
SRX381854 | 08H112 | 22 | TRUE | TRUE | TRUE | TRUE |
SRX332645 | 09H113 | 17 | TRUE | TRUE | TRUE | TRUE |
SRX959001 | 11H183 | 16 | FALSE | FALSE | FALSE | FALSE |
SRX381852 | 08H012 | 15 | FALSE | FALSE | FALSE | FALSE |
SRX958932 | 08H138 | 15 | FALSE | FALSE | FALSE | TRUE |
SRX381865 | 11H008 | 13 | FALSE | FALSE | FALSE | TRUE |
SRX958873 | 06H048 | 10 | FALSE | FALSE | FALSE | TRUE |
SRX958922 | 08H063 | 6 | FALSE | FALSE | FALSE | FALSE |
SRX958961 | 10H001 | 6 | FALSE | FALSE | FALSE | TRUE |
SRX958844 | 04H111 | 3 | FALSE | FALSE | FALSE | FALSE |
# add KMT2A results against AML controls as validation
validation <- filter(kmt2a_results, controls == "normal_controls") %>%
select(-c(controls, patient)) %>%
select(colnames(validation)) %>%
rbind(., validation)
get_results_by_gene(validation) %>%
group_by(sample) %>%
summarise(vargenes = length(unique(gene))) %>%
summarise(min = min(vargenes),
median = median(vargenes),
max = max(vargenes)) %>%
data.frame() %>%
gt() %>%
tab_header(
title = md("**Total MINTIE variant genes called by sample**")
) %>%
tab_options(
table.font.size = 12
) %>%
cols_label(
min = md("**Min**"),
median = md("**Median**"),
max = md("**Max**")
)
Total MINTIE variant genes called by sample | ||
---|---|---|
Min | Median | Max |
261 | 592 | 2265 |
Figure 4A.
Note that the TAP results must be obtained from Supplementary Table 4 from Chiu et al. 2018.
# load other callers
arriba_results <- get_results(here("data/leucegene/validation_results/Arriba"))
squid_results <- get_results(here("data/leucegene/validation_results/Squid"))
tap_results <- read.delim(here("data/leucegene/validation_results/TAP/TAP_leucegene_results.txt"), sep = "\t")
# get variant gene locs (needed to check Squid results)
vargene_locs <- read.delim(here("data/leucegene/leucegene_vargene_locs.tsv"), sep = "\t")
vgx <- GRanges(seqnames = vargene_locs$chrom,
ranges = IRanges(start = vargene_locs$start, end = vargene_locs$end),
genes = vargene_locs$gene)
# make truth table
truth_table <- rowwise(truth) %>%
mutate(mintie_found = is_variant_in_mintie_results(Experiment, gene1, gene2, variant, validation)) %>%
mutate(arriba_found = is_variant_in_results(Experiment, gene1, gene2, variant, "arriba", arriba_results)) %>%
mutate(tap_found = is_variant_in_results(patient_ID, gene1, gene2, variant, "tap", tap_results)) %>%
mutate(squid_found = is_variant_in_squid_results(Experiment, gene1, gene2, vgx, squid_results)) %>%
data.frame()
gt(truth_table) %>%
tab_header(
title = md("**Variants found in Leucegene cohort**")
) %>%
cols_label(
patient_ID = md("**Patient**"),
Experiment = md("**Experiment**"),
gene1 = md("**Gene 1**"),
gene2 = md("**Gene 2**"),
variant = md("**Variant**"),
cohort = md("**Cohort**"),
mintie_found = md("**MINTIE Found**"),
arriba_found = md("**Arriba Found**"),
squid_found = md("**Squid Found**"),
tap_found = md("**TAP Found**")
) %>%
tab_options(
table.font.size = 12
) %>%
tab_style(
style = cell_fill(color = "lightgreen"),
locations = cells_body(
columns = vars(mintie_found),
rows = mintie_found)
) %>%
tab_style(
style = cell_fill(color = "lightgreen"),
locations = cells_body(
columns = vars(arriba_found),
rows = arriba_found)
) %>%
tab_style(
style = cell_fill(color = "lightgreen"),
locations = cells_body(
columns = vars(squid_found),
rows = squid_found)
) %>%
tab_style(
style = cell_fill(color = "lightgreen"),
locations = cells_body(
columns = vars(tap_found),
rows = tap_found)
)
Variants found in Leucegene cohort | |||||||||
---|---|---|---|---|---|---|---|---|---|
Patient | Experiment | Gene 1 | Gene 2 | Variant | Cohort | MINTIE Found | Arriba Found | TAP Found | Squid Found |
03H065 | SRX729615 | RUNX1 | RUNX1T1 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
03H083 | SRX729616 | RUNX1 | RUNX1T1 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
03H095 | SRX729602 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
03H109 | SRX729580 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
03H112 | SRX729581 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
03H112 | SRX729581 | FLT3 | ITD | CBF | TRUE | FALSE | TRUE | FALSE | |
04H030 | SRX729603 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
04H061 | SRX729582 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
04H091 | SRX729583 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
05H042 | SRX729617 | RUNX1 | RUNX1T1 | fusion | CBF | TRUE | TRUE | TRUE | FALSE |
05H099 | SRX958862 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
05H113 | SRX729604 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
05H118 | SRX729618 | RUNX1 | RUNX1T1 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
05H136 | SRX729605 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | FALSE |
05H184 | SRX729619 | RUNX1 | RUNX1T1 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
06H020 | SRX729606 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
06H035 | SRX729620 | RUNX1 | RUNX1T1 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
06H115 | SRX729607 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
07H099 | SRX381851 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
07H137 | SRX729621 | RUNX1 | RUNX1T1 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
07H144 | SRX729585 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
08H034 | SRX729622 | RUNX1 | RUNX1T1 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
08H042 | SRX729623 | RUNX1 | RUNX1T1 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
08H072 | SRX729624 | RUNX1 | RUNX1T1 | fusion | CBF | TRUE | TRUE | TRUE | FALSE |
08H072 | SRX729624 | FLT3 | ITD | CBF | TRUE | TRUE | TRUE | FALSE | |
08H081 | SRX729586 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
08H099 | SRX729608 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
09H016 | SRX729587 | CBFB | MYH11 | fusion | CBF | FALSE | TRUE | TRUE | TRUE |
09H040 | SRX729625 | RUNX1 | RUNX1T1 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
09H066 | SRX729588 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
10H008 | SRX729609 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
10H030 | SRX729626 | RUNX1 | RUNX1T1 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
10H119 | SRX729627 | RUNX1 | RUNX1T1 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
11H022 | SRX729610 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
11H022 | SRX729610 | FLT3 | ITD | CBF | FALSE | FALSE | TRUE | FALSE | |
11H104 | SRX729589 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
11H107 | SRX729628 | RUNX1 | RUNX1T1 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
11H179 | SRX729611 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
12H042 | SRX729590 | CBFB | MYH11 | fusion | CBF | FALSE | TRUE | TRUE | TRUE |
12H044 | SRX729591 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
12H045 | SRX729629 | RUNX1 | RUNX1T1 | fusion | CBF | TRUE | TRUE | TRUE | FALSE |
12H098 | SRX729630 | RUNX1 | RUNX1T1 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
12H165 | SRX729592 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
12H166 | SRX729631 | RUNX1 | RUNX1T1 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
12H180 | SRX729632 | RUNX1 | RUNX1T1 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
12H183 | SRX729633 | RUNX1 | RUNX1T1 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
13H066 | SRX729612 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
13H120 | SRX959058 | CBFB | MYH11 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
13H169 | SRX959064 | RUNX1 | RUNX1T1 | fusion | CBF | TRUE | TRUE | TRUE | TRUE |
04H111 | SRX958844 | KMT2A | PTD | KMT2A-PTD | FALSE | TRUE | FALSE | TRUE | |
05H050 | SRX332633 | KMT2A | PTD | KMT2A-PTD | TRUE | TRUE | FALSE | TRUE | |
05H111 | SRX957223 | KMT2A | PTD | KMT2A-PTD | TRUE | TRUE | FALSE | TRUE | |
06H048 | SRX958873 | KMT2A | PTD | KMT2A-PTD | FALSE | TRUE | FALSE | FALSE | |
06H146 | SRX957230 | KMT2A | PTD | KMT2A-PTD | TRUE | TRUE | FALSE | TRUE | |
07H152 | SRX958906 | KMT2A | PTD | KMT2A-PTD | TRUE | TRUE | FALSE | TRUE | |
07H155 | SRX958907 | KMT2A | PTD | KMT2A-PTD | TRUE | TRUE | FALSE | TRUE | |
08H012 | SRX381852 | KMT2A | PTD | KMT2A-PTD | FALSE | TRUE | FALSE | TRUE | |
08H063 | SRX958922 | KMT2A | PTD | KMT2A-PTD | FALSE | TRUE | FALSE | FALSE | |
08H112 | SRX381854 | KMT2A | PTD | KMT2A-PTD | TRUE | TRUE | FALSE | FALSE | |
08H138 | SRX958932 | KMT2A | PTD | KMT2A-PTD | FALSE | TRUE | FALSE | TRUE | |
09H058 | SRX958945 | KMT2A | PTD | KMT2A-PTD | TRUE | TRUE | FALSE | TRUE | |
09H106 | SRX958959 | KMT2A | PTD | KMT2A-PTD | TRUE | TRUE | TRUE | TRUE | |
09H113 | SRX332645 | KMT2A | PTD | KMT2A-PTD | TRUE | TRUE | FALSE | TRUE | |
09H115 | SRX332646 | KMT2A | PTD | KMT2A-PTD | TRUE | TRUE | FALSE | FALSE | |
10H001 | SRX958961 | KMT2A | PTD | KMT2A-PTD | FALSE | TRUE | FALSE | FALSE | |
10H007 | SRX958963 | KMT2A | PTD | KMT2A-PTD | TRUE | TRUE | FALSE | TRUE | |
10H070 | SRX958974 | KMT2A | PTD | KMT2A-PTD | TRUE | TRUE | FALSE | TRUE | |
11H008 | SRX381865 | KMT2A | PTD | KMT2A-PTD | FALSE | TRUE | FALSE | FALSE | |
11H021 | SRX332659 | KMT2A | PTD | KMT2A-PTD | TRUE | TRUE | FALSE | TRUE | |
11H183 | SRX959001 | KMT2A | PTD | KMT2A-PTD | FALSE | TRUE | FALSE | TRUE | |
13H048 | SRX959044 | KMT2A | PTD | KMT2A-PTD | TRUE | TRUE | FALSE | FALSE | |
13H141 | SRX959060 | KMT2A | PTD | KMT2A-PTD | TRUE | TRUE | FALSE | FALSE | |
13H150 | SRX959061 | KMT2A | PTD | KMT2A-PTD | TRUE | TRUE | FALSE | FALSE | |
03H041 | SRX332627 | NUP98 | NSD1 | fusion | NUP98-NSD1 | FALSE | TRUE | TRUE | TRUE |
03H041 | SRX332627 | FLT3 | ITD | NUP98-NSD1 | TRUE | TRUE | TRUE | FALSE | |
05H034 | SRX958856 | NUP98 | NSD1 | fusion | NUP98-NSD1 | TRUE | TRUE | TRUE | TRUE |
05H163 | SRX332635 | NUP98 | NSD1 | fusion | NUP98-NSD1 | TRUE | TRUE | TRUE | TRUE |
08H049 | SRX958915 | NUP98 | NSD1 | fusion | NUP98-NSD1 | TRUE | TRUE | TRUE | TRUE |
08H049 | SRX958915 | FLT3 | ITD | NUP98-NSD1 | TRUE | FALSE | TRUE | FALSE | |
10H038 | SRX381861 | NUP98 | NSD1 | fusion | NUP98-NSD1 | TRUE | TRUE | TRUE | TRUE |
11H027 | SRX958987 | NUP98 | NSD1 | fusion | NUP98-NSD1 | TRUE | TRUE | TRUE | TRUE |
11H027 | SRX958987 | FLT3 | ITD | NUP98-NSD1 | TRUE | TRUE | TRUE | FALSE | |
11H160 | SRX332667 | NUP98 | NSD1 | fusion | NUP98-NSD1 | TRUE | TRUE | TRUE | TRUE |
11H160 | SRX332667 | FLT3 | ITD | NUP98-NSD1 | TRUE | FALSE | TRUE | FALSE |
# tally up detected variants into summary table
truth_summary <- truth_table %>%
group_by(gene1, gene2, variant) %>%
summarise(mintie_detected = sum(mintie_found),
arriba_detected = sum(arriba_found),
squid_detected = sum(squid_found),
tap_detected = sum(tap_found),
total = length(mintie_found)) %>%
data.frame()
gt(truth_summary) %>%
tab_header(
title = md("**Summary of variants found in Leucegene cohort**")
) %>%
cols_label(
gene1 = md("**Gene 1**"),
gene2 = md("**Gene 2**"),
variant = md("**Variant**"),
mintie_detected = md("**MINTIE Detected**"),
arriba_detected = md("**Arriba Detected**"),
squid_detected = md("**Squid Detected**"),
tap_detected = md("**TAP Detected**"),
total = md("**Total**")
) %>%
tab_options(
table.font.size = 12
)
Summary of variants found in Leucegene cohort | |||||||
---|---|---|---|---|---|---|---|
Gene 1 | Gene 2 | Variant | MINTIE Detected | Arriba Detected | Squid Detected | TAP Detected | Total |
CBFB | MYH11 | fusion | 24 | 26 | 25 | 26 | 26 |
FLT3 | ITD | 6 | 3 | 0 | 7 | 7 | |
KMT2A | PTD | 16 | 24 | 15 | 1 | 24 | |
NUP98 | NSD1 | fusion | 6 | 7 | 7 | 7 | 7 |
RUNX1 | RUNX1T1 | fusion | 20 | 20 | 17 | 20 | 20 |
ts <- truth_summary %>%
reshape2::melt() %>%
group_by(variant, variable) %>%
summarise(detected = sum(value)) %>%
data.frame()
ts$method <- gsub("_detected", "", ts$variable)
# reorder factors
ts$method <- factor(ts$method, levels = c("mintie", "arriba", "squid", "tap", "total"))
ts$variant <- factor(paste0(ts$variant, "s"), levels=c("fusions", "PTDs", "ITDs"))
ggplot(ts[ts$method != "total",], aes(method, detected)) +
geom_bar(stat = "identity") +
theme_bw() +
ylab("Detected") +
xlab("Method") +
scale_x_discrete(labels=c("MINTIE", "Arriba", "SQUID", "TAP")) +
scale_y_discrete(limits=seq(0, 53, 5)) +
geom_hline(data=ts[ts$method == "total",], aes(yintercept=detected), colour="salmon") +
theme(axis.text.x = element_text(size = 11),
axis.text.y = element_text(size = 11),
strip.text.x = element_text(size = 11)) +
facet_grid(~variant)
sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] GenomicRanges_1.44.0 GenomeInfoDb_1.28.4 IRanges_2.26.0
[4] S4Vectors_0.30.0 BiocGenerics_0.38.0 gt_0.3.1
[7] ggplot2_3.3.5 stringr_1.4.0 here_1.0.1
[10] dplyr_1.0.7 data.table_1.14.0 workflowr_1.6.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.7 assertthat_0.2.1 rprojroot_2.0.2
[4] digest_0.6.27 utf8_1.2.2 R6_2.5.1
[7] plyr_1.8.6 backports_1.2.1 evaluate_0.14
[10] highr_0.9 pillar_1.6.2 zlibbioc_1.38.0
[13] rlang_0.4.11 whisker_0.4 jquerylib_0.1.4
[16] checkmate_2.0.0 rmarkdown_2.11 labeling_0.4.2
[19] RCurl_1.98-1.5 munsell_0.5.0 compiler_4.1.1
[22] httpuv_1.6.3 xfun_0.25 pkgconfig_2.0.3
[25] htmltools_0.5.2 tidyselect_1.1.1 tibble_3.1.4
[28] GenomeInfoDbData_1.2.6 fansi_0.5.0 crayon_1.4.1
[31] withr_2.4.2 later_1.3.0 bitops_1.0-7
[34] commonmark_1.7 grid_4.1.1 gtable_0.3.0
[37] lifecycle_1.0.0 DBI_1.1.1 git2r_0.28.0
[40] magrittr_2.0.1 scales_1.1.1 stringi_1.7.4
[43] farver_2.1.0 XVector_0.32.0 reshape2_1.4.4
[46] fs_1.5.0 promises_1.2.0.1 ellipsis_0.3.2
[49] generics_0.1.0 vctrs_0.3.8 RColorBrewer_1.1-2
[52] tools_4.1.1 glue_1.4.2 purrr_0.3.4
[55] fastmap_1.1.0 yaml_2.2.1 colorspace_2.0-2
[58] knitr_1.33 sass_0.4.0