Last updated: 2026-01-07

Checks: 5 2

Knit directory: public_barcode_count/

This reproducible R Markdown analysis was created with workflowr (version 1.7.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

The global environment had objects present when the code in the R Markdown file was run. These objects can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment. Use wflow_publish or wflow_build to ensure that the code is always run in an empty environment.

The following objects were defined in the global environment when these results were created:

Name Class Size
module function 5.6 Kb

The command set.seed(20250112) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version f48add2. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .RData
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    public_barcode_count.Rproj

Untracked files:
    Untracked:  README.html

Unstaged changes:
    Modified:   README.md
    Modified:   analysis/index.Rmd
    Modified:   output/fs1_mixture.png
    Modified:   output/mixture_barbieQ.rda

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/index.Rmd) and HTML (docs/index.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd f48add2 FeiLiyang 2026-01-07 supple analyses
html f48add2 FeiLiyang 2026-01-07 supple analyses
Rmd 34f5894 FeiLiyang 2026-01-01 reorder f3
html 34f5894 FeiLiyang 2026-01-01 reorder f3
html 6b0ff60 Liyang Fei 2025-05-14 customize wflow
Rmd 88e2a58 Liyang Fei 2025-05-14 add analysis/
Rmd e3acf90 feiliyang 2025-01-14 initalize WuC analysis
html e3acf90 feiliyang 2025-01-14 initalize WuC analysis
Rmd 9ec2763 feiliyang 2025-01-12 Start workflowr project.

1 Project aim

This project gathered several public barcode count datasets and analyzed them using the barbieQ package

2 Analyses in the barbieQ paper

barbieQ preprint on bioRxiv

barbieQ R package on GitHub

barbieQ R package on Bioconductor

Figure Content
Figure 1 Package flowchart
Figure 2 Preprocessing Monkey HSPC data
Figure S1 AML Preprocessing AML data
Figure S1 HSPC xeno Preprocessing HSPC xenograft data
Figure S1 Mixture Preprocessing Mixture data
Figure 3 and Figure S3 Assessing Type I error rate and power of statistical tests using Mixture data
Figure S2 AML Assessing Type I error rate using AML data
Figure S2 HSPC xeno Assessing Type I error rate using HSPC xenograft data
Figure 4 Case study using Monkey HSPC data

3 Datasets

3.1 HSPC xenograft data Access

Public data from a study of engraftment tracking of human umbilical cord blood HSPC xenotransplanted into mice. The data were analysed in the barcodetrackR publication and made available via the compatible barcodetrackRData repository on GitHub.

In this study, human cord blood HSPC cells from 20 individual donors were isolated and barcoded at the DNA level respectively. Sets of starting clones from different donors were transplanted into different mice (n = 30), each donor in line with 1 or 2 recipient mice. For each mouse, progeny cells were collected from peripheral blood at multiple time points, as well as from various tissues. From each collection, cells were sorted into different cell types forming individual samples, with unsorted cells also retained. Herein, we used data of recipient mice with sufficiently engrafted HSPC clones, containing 10,149 barcodes across 8 donors, with 199 samples, under the described conditions.

3.2 AML data Access

Publicly available data from a study of acute myeloid leukaemia (AML) clones, investigating the heterogeneity in their response to various therapeutic drugs in vitro, which was originally analysed and introduced with bartools. Briefly, AML cells were barcoded at the DNA level using the SPLINTR system, recognized as individual clones, and their population was expanded under exposure to different drugs (Arac, IBET), at various doses and DMSO as a negative control. Samples were collected from each treatment condition at a series of time points. This barcode count matrix contains 1,811 barcodes, across 41 samples, under the described conditions.

3.3 Mixture data Access

Original publication

Dataset generated to simulate both true and null changes in barcode abundance by mixing cells from two barcoded samples. Cells from each cell line were divided into two pools (Pool1 and Pool2), each incorporating distinct clonal tracking barcodes into their DNA, ensuring no overlap between pools. Cells in each pool were counted and mixed in an equal ratio to produce a mixed pool. Twelve baseline samples were sampled from the mixed pool at various sizes. Twenty-four perturbed samples were generated by sampling a certain number of cells from the mixed pool and adding a certain ratio of cells from Pool1, with 2 replicates in each case. This dataset contains 3998 barcodes, across 36 samples as described, as well as two reference samples representing the unmixed Pool1 and Pool2 barcode counts.

3.4 Monkey HSPC data Access

A subset of publicly available data from a study on monkey hematopoietic stem and progenitor cell (HSPC) clonal expansion in vivo using barcoding technique. The monkey HSPC data have been analysed using the barcodetrackR package and made available via the compatible barcodetrackRData repository on GitHub. Herein, we used data from monkey “ZG66” including 16,603 barcodes across all samples, where we further selected 30 samples to be used here.

Briefly, unique barcodes were initially integrated into the DNA of HSPCs and subsequently passed to progeny cells. At a series of time points, progeny cells were collected from blood and sorted into various cell types, including T cells, B cells, granulocytes (Gr), and natural killer (NK) cells for NKCD56+CD16- and NKCD56-CD16+ subtypes. Barcode counts across different cell types were used to interpret the patterns of HSPC differentiation. The original study focused on identifying barcodes (clones) with higher abundance in NKCD56-CD16+ samples compared to other cell type samples.


sessionInfo()
R version 4.5.0 (2025-04-11)
Platform: x86_64-pc-linux-gnu
Running under: Red Hat Enterprise Linux 9.6 (Plow)

Matrix products: default
BLAS/LAPACK: FlexiBLAS OPENBLAS-OPENMP;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

time zone: Australia/Melbourne
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] workflowr_1.7.2

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5       httr_1.4.7        cli_3.6.5         knitr_1.50       
 [5] rlang_1.1.6       xfun_0.53         stringi_1.8.7     processx_3.8.6   
 [9] promises_1.3.3    jsonlite_2.0.0    glue_1.8.0        rprojroot_2.1.1  
[13] git2r_0.36.2      htmltools_0.5.8.1 httpuv_1.6.16     ps_1.9.1         
[17] sass_0.4.10       rmarkdown_2.30    jquerylib_0.1.4   tibble_3.3.0     
[21] evaluate_1.0.5    fastmap_1.2.0     yaml_2.3.10       lifecycle_1.0.4  
[25] whisker_0.4.1     stringr_1.5.2     compiler_4.5.0    fs_1.6.6         
[29] pkgconfig_2.0.3   Rcpp_1.1.0        rstudioapi_0.17.1 later_1.4.4      
[33] digest_0.6.37     R6_2.6.1          pillar_1.11.1     callr_3.7.6      
[37] magrittr_2.0.4    bslib_0.9.0       tools_4.5.0       cachem_1.1.0     
[41] getPass_0.2-4