Last updated: 2026-01-07

Checks: 5 2

Knit directory: public_barcode_count/

This reproducible R Markdown analysis was created with workflowr (version 1.7.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: uncommitted changes

The R Markdown file has unstaged changes. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Environment: objects present

The global environment had objects present when the code in the R Markdown file was run. These objects can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment. Use wflow_publish or wflow_build to ensure that the code is always run in an empty environment.

The following objects were defined in the global environment when these results were created:

Name	Class	Size
module	function	5.6 Kb

Seed: set.seed(20250112)

The command set.seed(20250112) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: f48add2

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version f48add2. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .RData
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    public_barcode_count.Rproj

Untracked files:
    Untracked:  README.html

Unstaged changes:
    Modified:   README.md
    Modified:   analysis/index.Rmd
    Modified:   output/fs1_mixture.png
    Modified:   output/mixture_barbieQ.rda

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/index.Rmd) and HTML (docs/index.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
Rmd	f48add2	FeiLiyang	2026-01-07	supple analyses
html	f48add2	FeiLiyang	2026-01-07	supple analyses
Rmd	34f5894	FeiLiyang	2026-01-01	reorder f3
html	34f5894	FeiLiyang	2026-01-01	reorder f3
html	6b0ff60	Liyang Fei	2025-05-14	customize wflow
Rmd	88e2a58	Liyang Fei	2025-05-14	add analysis/
Rmd	e3acf90	feiliyang	2025-01-14	initalize WuC analysis
html	e3acf90	feiliyang	2025-01-14	initalize WuC analysis
Rmd	9ec2763	feiliyang	2025-01-12	Start workflowr project.

1 Project aim

This project gathered several public barcode count datasets and analyzed them using the barbieQ package

2 Analyses in the barbieQ paper

barbieQ preprint on bioRxiv

barbieQ R package on GitHub

barbieQ R package on Bioconductor

Figure	Content
Figure 1	Package flowchart
Figure 2	Preprocessing Monkey HSPC data
Figure S1 AML	Preprocessing AML data
Figure S1 HSPC xeno	Preprocessing HSPC xenograft data
Figure S1 Mixture	Preprocessing Mixture data
Figure 3 and Figure S3	Assessing Type I error rate and power of statistical tests using Mixture data
Figure S2 AML	Assessing Type I error rate using AML data
Figure S2 HSPC xeno	Assessing Type I error rate using HSPC xenograft data
Figure 4	Case study using Monkey HSPC data

3 Datasets

3.1 HSPC xenograft data Access

Public data from a study of engraftment tracking of human umbilical cord blood HSPC xenotransplanted into mice. The data were analysed in the barcodetrackR publication and made available via the compatible barcodetrackRData repository on GitHub.

In this study, human cord blood HSPC cells from 20 individual donors were isolated and barcoded at the DNA level respectively. Sets of starting clones from different donors were transplanted into different mice (n = 30), each donor in line with 1 or 2 recipient mice. For each mouse, progeny cells were collected from peripheral blood at multiple time points, as well as from various tissues. From each collection, cells were sorted into different cell types forming individual samples, with unsorted cells also retained. Herein, we used data of recipient mice with sufficiently engrafted HSPC clones, containing 10,149 barcodes across 8 donors, with 199 samples, under the described conditions.

3.2 AML data Access

Publicly available data from a study of acute myeloid leukaemia (AML) clones, investigating the heterogeneity in their response to various therapeutic drugs in vitro, which was originally analysed and introduced with bartools. Briefly, AML cells were barcoded at the DNA level using the SPLINTR system, recognized as individual clones, and their population was expanded under exposure to different drugs (Arac, IBET), at various doses and DMSO as a negative control. Samples were collected from each treatment condition at a series of time points. This barcode count matrix contains 1,811 barcodes, across 41 samples, under the described conditions.

3.3 Mixture data Access

Original publication

Dataset generated to simulate both true and null changes in barcode abundance by mixing cells from two barcoded samples. Cells from each cell line were divided into two pools (Pool1 and Pool2), each incorporating distinct clonal tracking barcodes into their DNA, ensuring no overlap between pools. Cells in each pool were counted and mixed in an equal ratio to produce a mixed pool. Twelve baseline samples were sampled from the mixed pool at various sizes. Twenty-four perturbed samples were generated by sampling a certain number of cells from the mixed pool and adding a certain ratio of cells from Pool1, with 2 replicates in each case. This dataset contains 3998 barcodes, across 36 samples as described, as well as two reference samples representing the unmixed Pool1 and Pool2 barcode counts.

3.4 Monkey HSPC data Access

A subset of publicly available data from a study on monkey hematopoietic stem and progenitor cell (HSPC) clonal expansion in vivo using barcoding technique. The monkey HSPC data have been analysed using the barcodetrackR package and made available via the compatible barcodetrackRData repository on GitHub. Herein, we used data from monkey “ZG66” including 16,603 barcodes across all samples, where we further selected 30 samples to be used here.

Briefly, unique barcodes were initially integrated into the DNA of HSPCs and subsequently passed to progeny cells. At a series of time points, progeny cells were collected from blood and sorted into various cell types, including T cells, B cells, granulocytes (Gr), and natural killer (NK) cells for NKCD56+CD16- and NKCD56-CD16+ subtypes. Barcode counts across different cell types were used to interpret the patterns of HSPC differentiation. The original study focused on identifying barcodes (clones) with higher abundance in NKCD56-CD16+ samples compared to other cell type samples.

sessionInfo()

R version 4.5.0 (2025-04-11)
Platform: x86_64-pc-linux-gnu
Running under: Red Hat Enterprise Linux 9.6 (Plow)

Matrix products: default
BLAS/LAPACK: FlexiBLAS OPENBLAS-OPENMP;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8    
 [5] LC_MONETARY=en_AU.UTF-8    LC_MESSAGES=en_AU.UTF-8   
 [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       

time zone: Australia/Melbourne
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] workflowr_1.7.2

loaded via a namespace (and not attached):
 [1] vctrs_0.6.5       httr_1.4.7        cli_3.6.5         knitr_1.50       
 [5] rlang_1.1.6       xfun_0.53         stringi_1.8.7     processx_3.8.6   
 [9] promises_1.3.3    jsonlite_2.0.0    glue_1.8.0        rprojroot_2.1.1  
[13] git2r_0.36.2      htmltools_0.5.8.1 httpuv_1.6.16     ps_1.9.1         
[17] sass_0.4.10       rmarkdown_2.30    jquerylib_0.1.4   tibble_3.3.0     
[21] evaluate_1.0.5    fastmap_1.2.0     yaml_2.3.10       lifecycle_1.0.4  
[25] whisker_0.4.1     stringr_1.5.2     compiler_4.5.0    fs_1.6.6         
[29] pkgconfig_2.0.3   Rcpp_1.1.0        rstudioapi_0.17.1 later_1.4.4      
[33] digest_0.6.37     R6_2.6.1          pillar_1.11.1     callr_3.7.6      
[37] magrittr_2.0.4    bslib_0.9.0       tools_4.5.0       cachem_1.1.0     
[41] getPass_0.2-4

Analysis of public barcode count datasets

Liyang Fei

Initiate: early 2025

Last update: 2026-01-07

1 Project aim

2 Analyses in the barbieQ paper

3 Datasets

3.1 HSPC xenograft data Access

3.2 AML data Access

3.3 Mixture data Access

3.4 Monkey HSPC data Access

Analysis of public barcode count datasets

Liyang Fei

Initiate: early 2025 Last update: 2026-01-07

1 Project aim

2 Analyses in the barbieQ paper

3 Datasets

3.1 HSPC xenograft data Access

3.2 AML data Access

3.3 Mixture data Access

3.4 Monkey HSPC data Access

Initiate: early 2025

Last update: 2026-01-07