Last updated: 2025-09-10

Checks: 7 0

Knit directory: paediatric-cf-inflammation-citeseq/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20240216) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version d121b7e. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store
    Ignored:    analysis/figure/
    Ignored:    code/obsolete/
    Ignored:    data/.DS_Store
    Ignored:    data/C133_Neeland_batch1/
    Ignored:    data/C133_Neeland_merged/
    Ignored:    data/intermediate_objects/.DS_Store
    Ignored:    renv/library/
    Ignored:    renv/staging/

Untracked files:
    Untracked:  analysis/14.2_DGE_analysis_ciliated-epithelial-cells.Rmd
    Untracked:  analysis/16.5_Figure_6.Rmd
    Untracked:  analysis/16.5_Supplementary_Figure_2.Rmd
    Untracked:  analysis/cellxgene_submission.Rmd
    Untracked:  analysis/epithelial_cell_analysis.Rmd
    Untracked:  analysis/epithelial_cell_analysis.nb.html
    Untracked:  data/GOBP_CYTOKINE_MEDIATED_SIGNALING_PATHWAY.v2025.1.Hs.tsv
    Untracked:  data/Neeland_processed_data_1.h5ad
    Untracked:  data/Neeland_processed_data_2.h5ad
    Untracked:  data/Neeland_processed_data_3.h5ad
    Untracked:  data/cellxgene_cell_ontologies_ann_level_3.xlsx
    Untracked:  data/gencode.v44.primary_assembly.annotation.gtf
    Untracked:  data/intermediate_objects/CD4 T cells.CF_samples.fit.rds
    Untracked:  data/intermediate_objects/CD4 T cells.all_samples.fit.rds
    Untracked:  data/intermediate_objects/CD8 T cells.CF_samples.fit.rds
    Untracked:  data/intermediate_objects/CD8 T cells.all_samples.fit.rds
    Untracked:  data/intermediate_objects/DC cells.CF_samples.fit.rds
    Untracked:  data/intermediate_objects/DC cells.all_samples.fit.rds
    Untracked:  data/updated_h5ad_files/
    Untracked:  output/dge_analysis/epithelial cells/
    Untracked:  output/pdf_figures/
    Untracked:  paediatric-cf-inflammation-citeseq.Rproj

Unstaged changes:
    Modified:   .DS_Store
    Modified:   .gitignore
    Modified:   analysis/13.0_DGE_analysis_macrophages.Rmd
    Modified:   analysis/13.1_DGE_analysis_macro-alveolar.Rmd
    Modified:   analysis/13.2_DGE_analysis_macro-APOC2+.Rmd
    Modified:   analysis/13.3_DGE_analysis_macro-CCL.Rmd
    Modified:   analysis/13.4_DGE_analysis_macro-IFI27.Rmd
    Modified:   analysis/13.5_DGE_analysis_macro-lipid.Rmd
    Modified:   analysis/13.6_DGE_analysis_macro-monocyte-derived.Rmd
    Modified:   analysis/13.7_DGE_analysis_macro-proliferating.Rmd
    Modified:   analysis/14.0_DGE_analysis_CD4-T-cells.Rmd
    Modified:   analysis/14.1_DGE_analysis_CD8-T-cells.Rmd
    Modified:   analysis/14.2_DGE_analysis_DC-cells.Rmd
    Modified:   analysis/16.1_Figure_2.Rmd
    Modified:   analysis/16.2_Figure_3.Rmd
    Modified:   analysis/16.3_Figure_4.Rmd
    Modified:   analysis/16.4_Figure_5.Rmd
    Deleted:    analysis/16.5_Supplementary_Figure_ADTs.Rmd
    Modified:   analysis/16.6_Supplementary_Figures.Rmd
    Deleted:    code/run_cellbender.R
    Modified:   code/utility.R
    Modified:   data/intermediate_objects/macrophages.CF_samples.fit.rds
    Modified:   data/intermediate_objects/macrophages.all_samples.fit.rds
    Modified:   output/dge_analysis/macrophages/CAM.FIBROSIS.CF.IVAvCF.NO_MOD.csv
    Modified:   output/dge_analysis/macrophages/CAM.FIBROSIS.CF.IVAvNON_CF.CTRL.csv
    Modified:   output/dge_analysis/macrophages/CAM.FIBROSIS.CF.LUMA_IVAvCF.NO_MOD.csv
    Modified:   output/dge_analysis/macrophages/CAM.FIBROSIS.CF.NO_MOD.SvCF.NO_MOD.M.csv
    Modified:   output/dge_analysis/macrophages/CAM.FIBROSIS.CF.NO_MODvNON_CF.CTRL.csv
    Modified:   output/dge_analysis/macrophages/CAM.GO.CF.IVAvCF.NO_MOD.csv
    Modified:   output/dge_analysis/macrophages/CAM.GO.CF.IVAvNON_CF.CTRL.csv
    Modified:   output/dge_analysis/macrophages/CAM.GO.CF.LUMA_IVAvCF.NO_MOD.csv
    Modified:   output/dge_analysis/macrophages/CAM.GO.CF.NO_MOD.SvCF.NO_MOD.M.csv
    Modified:   output/dge_analysis/macrophages/CAM.GO.CF.NO_MODvNON_CF.CTRL.csv
    Modified:   output/dge_analysis/macrophages/CAM.HALLMARK.CF.IVAvCF.NO_MOD.csv
    Modified:   output/dge_analysis/macrophages/CAM.HALLMARK.CF.IVAvNON_CF.CTRL.csv
    Modified:   output/dge_analysis/macrophages/CAM.HALLMARK.CF.LUMA_IVAvCF.NO_MOD.csv
    Modified:   output/dge_analysis/macrophages/CAM.HALLMARK.CF.NO_MOD.SvCF.NO_MOD.M.csv
    Modified:   output/dge_analysis/macrophages/CAM.HALLMARK.CF.NO_MODvNON_CF.CTRL.csv
    Modified:   output/dge_analysis/macrophages/CAM.REACTOME.CF.IVAvCF.NO_MOD.csv
    Modified:   output/dge_analysis/macrophages/CAM.REACTOME.CF.IVAvNON_CF.CTRL.csv
    Modified:   output/dge_analysis/macrophages/CAM.REACTOME.CF.LUMA_IVAvCF.NO_MOD.csv
    Modified:   output/dge_analysis/macrophages/CAM.REACTOME.CF.NO_MOD.SvCF.NO_MOD.M.csv
    Modified:   output/dge_analysis/macrophages/CAM.REACTOME.CF.NO_MODvNON_CF.CTRL.csv
    Modified:   output/dge_analysis/macrophages/CAM.WP.CF.IVAvCF.NO_MOD.csv
    Modified:   output/dge_analysis/macrophages/CAM.WP.CF.IVAvNON_CF.CTRL.csv
    Modified:   output/dge_analysis/macrophages/CAM.WP.CF.LUMA_IVAvCF.NO_MOD.csv
    Modified:   output/dge_analysis/macrophages/CAM.WP.CF.NO_MOD.SvCF.NO_MOD.M.csv
    Modified:   output/dge_analysis/macrophages/CAM.WP.CF.NO_MODvNON_CF.CTRL.csv
    Modified:   output/dge_analysis/macrophages/CF.IVAvCF.NO_MOD.csv
    Modified:   output/dge_analysis/macrophages/CF.IVAvNON_CF.CTRL.csv
    Modified:   output/dge_analysis/macrophages/CF.LUMA_IVAvCF.NO_MOD.csv
    Modified:   output/dge_analysis/macrophages/CF.NO_MOD.SvCF.NO_MOD.M.csv
    Modified:   output/dge_analysis/macrophages/CF.NO_MODvNON_CF.CTRL.csv
    Modified:   output/dge_analysis/macrophages/ORA.GO.CF.IVAvNON_CF.CTRL.csv
    Modified:   output/dge_analysis/macrophages/ORA.GO.CF.NO_MOD.SvCF.NO_MOD.M.csv
    Modified:   output/dge_analysis/macrophages/ORA.GO.CF.NO_MODvNON_CF.CTRL.csv
    Modified:   output/dge_analysis/macrophages/ORA.HALLMARK.CF.IVAvCF.NO_MOD.csv
    Modified:   output/dge_analysis/macrophages/ORA.REACTOME.CF.NO_MODvNON_CF.CTRL.csv
    Deleted:    paed-inflammation-CITEseq.Rproj
    Modified:   renv.lock
    Modified:   renv/activate.R
    Modified:   renv/settings.json

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/16.0_Figure_1.Rmd) and HTML (docs/16.0_Figure_1.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd d121b7e Jovana Maksimovic 2025-09-10 wflow_publish("analysis/16.0_Figure_1.Rmd")
Rmd 6691547 Jovana Maksimovic 2025-08-13 Updated Figure 1 code to add cell type dendrogram.
html 15e0d6f Jovana Maksimovic 2025-02-20 Build site.
Rmd bbe91a3 Jovana Maksimovic 2025-02-20 wflow_publish("analysis/16.0_Figure_1.Rmd")
html 360908b Jovana Maksimovic 2025-02-17 Build site.
Rmd 0246a56 Jovana Maksimovic 2025-02-17 wflow_publish("analysis/16.0_Figure_1.Rmd")

Load libraries.

suppressPackageStartupMessages({
 library(SingleCellExperiment)
 library(edgeR)
 library(tidyverse)
 library(ggplot2)
 library(Seurat)
 library(glmGamPoi)
 library(dittoSeq)
 library(here)
 library(clustree)
 library(patchwork)
 library(AnnotationDbi)
 library(org.Hs.eg.db)
 library(glue)
 library(speckle)
 library(tidyHeatmap)
 library(paletteer)
 library(dsb)
 library(ggh4x)
 library(readxl)
})

source(here("code/utility.R"))

Prepare figure panels

ADT heatmap

Load the ADT data from save, pre-processed objects.

files <- list.files(here("data/C133_Neeland_merged"),
                    pattern = "C133_Neeland_full_clean.*(macrophages|t_cells|other_cells)_annotated_full.SEU.rds",
                    full.names = TRUE)

# read in the ADT data, excluding everything else
seuLst <- lapply(files, function(f){
  s <- readRDS(f)
  DefaultAssay(s) <- "ADT"
  DietSeurat(s, assays = "ADT", dimreducs = NULL)
})

adt_names <- rownames(seuLst[[1]][["ADT"]]@counts)
seuLst <- lapply(seuLst, function(s){
  DefaultAssay(s) <- "ADT"
  
  if(!all(rownames(s) == adt_names)){
    adt_counts <- s[["ADT"]]@counts
    rownames(adt_counts) <- adt_names
    CreateSeuratObject(counts = adt_counts,
                       assay = "ADT",
                       meta.data = s@meta.data)
    
  } else {
    s
  }
})

seuADT <- merge(seuLst[[1]], 
                y = c(seuLst[[2]], 
                      seuLst[[3]]))
seuADT <- seuADT[, seuADT$Batch != 0]
seuADT
An object of class Seurat 
163 features across 168859 samples within 1 assay 
Active assay: ADT (163 features, 0 variable features)

Normalise the ADT data across all cells using DSB.

# read in TotalSeq feature information
read_csv(file = here("data",
                     "C133_Neeland_batch1",
                     "data",
                     "sample_sheets",
                     "ADT_features.csv")) -> adt_data
pattern <- "anti-human/mouse |anti-human/mouse/rat |anti-mouse/human "
adt_data$name <- gsub(pattern, "", adt_data$name)

out <- here("data",
            "C133_Neeland_merged",
            glue("C133_Neeland_full_clean_all_cells_dsb.ADT.rds"))
if(!file.exists(out)){
  adt_data %>%
    dplyr::filter(grepl("[Ii]sotype", name)) %>%
    pull(name) -> isotype_controls
  
  # normalise ADT using DSB normalisation
  adt_dsb <- ModelNegativeADTnorm(cell_protein_matrix = seuADT[["ADT"]]@counts,
                                  denoise.counts = TRUE,
                                  use.isotype.control = TRUE,
                                  isotype.control.name.vec = isotype_controls)
  saveRDS(adt_dsb, file = out)
  
} else {
  adt_dsb <- readRDS(out)
  
}

seuADT[["ADT"]]@data <- adt_dsb
seuADT
An object of class Seurat 
163 features across 168859 samples within 1 assay 
Active assay: ADT (163 features, 0 variable features)

Set up mapping for long cell type labels to short labels.

labels <- unique(seuADT$ann_level_3)

# Make a lookup table: label -> ann_level_1/2/3
hier_lut <- tibble(label = labels) %>%
  mutate(
    ann_level_1 = case_when(
      str_detect(label, "^macro") ~ "myeloid",
      label %in% c("monocytes","neutrophil-like","mast cells") ~ "myeloid",
      label %in% c("cDC1","cDC2","plasmacytoid DC","migratory DC") ~ "myeloid",
      label %in% c("CD4 T cells","CD8 T-rm","CD8 T-GZMK","CD8 T-inflammasome",
                   "CD4 T-rm","CD4 T-NFKB","CD4 T-naïve","CD4 T-IFN","CD4 T-reg",
                   "gamma delta T cells","NK-T cells","proliferating T/NK") ~ "lymphoid",
      label %in% c("NK cells","innate lymphocytes","dividing innate cells") ~ "lymphoid",
      label %in% c("B cells","HSP+ B cells","plasma B cells") ~ "lymphoid",
      label %in% c("secretory epithelial cells","ciliated epithelial cells") ~ "epithelial",
      TRUE ~ "other"
    ),
    ann_level_2 = case_when(
      # Myeloid
      str_detect(label, "^macro") ~ "macrophages",
      label %in% c("monocytes") ~ "monocytes",
      label %in% c("neutrophil-like", "mast cells") ~ "granulocytes",
      label %in% c("cDC1","cDC2","plasmacytoid DC","migratory DC") ~ "dendritic cells",

      # Lymphoid
      label %in% c("CD4 T cells","CD4 T-rm","CD4 T-NFKB","CD4 T-naïve","CD4 T-IFN","CD4 T-reg",
                   "CD8 T-rm","CD8 T-GZMK","CD8 T-inflammasome","gamma delta T cells","NK-T cells",
                   "proliferating T/NK") ~ "T cells",
      label %in% c("NK cells","innate lymphocytes","dividing innate cells") ~ "innate lymphoid",
      label %in% c("B cells","HSP+ B cells","plasma B cells") ~ "B cells",

      # Epithelial
      label %in% c("secretory epithelial cells","ciliated epithelial cells") ~ "epithelial cells",
      TRUE ~ "other"
    ),
    ann_level_3 = case_when(
      # Macrophage subtypes
      label == "macro-alveolar" ~ "macro-alveolar",
      str_detect(label, "^macro-alveolar") ~ label,  # keep specific alveolar flavors
      label %in% c("macro-interstitial","macro-monocyte-derived","macro-T") ~ label,
      label %in% c("macro-proliferating-S","macro-proliferating-G2M") ~ "macro-proliferating",

      # DC subtypes
      label %in% c("cDC1","cDC2","plasmacytoid DC","migratory DC") ~ label,

      # Neutro/mono/mast
      label %in% c("monocytes","neutrophil-like","mast cells") ~ label,

      # T cells
      label == "CD4 T cells" ~ "CD4 T cells",
      label %in% c("CD4 T-rm","CD4 T-NFKB","CD4 T-naïve","CD4 T-IFN","CD4 T-reg") ~ label,
      label %in% c("CD8 T-rm","CD8 T-GZMK","CD8 T-inflammasome") ~ label,
      label %in% c("gamma delta T cells","NK-T cells") ~ label,
      label == "proliferating T/NK" ~ "proliferating T-NK",

      # NK / ILC
      label %in% c("NK cells","innate lymphocytes","dividing innate cells") ~ label,

      # B lineage
      label %in% c("B cells","HSP+ B cells","plasma B cells") ~ label,

      # Epithelia
      label %in% c("secretory epithelial cells","ciliated epithelial cells") ~ label,

      TRUE ~ label
    )
  )

# Inspect:
hier_lut
# A tibble: 44 × 4
   label                      ann_level_1 ann_level_2      ann_level_3          
   <chr>                      <chr>       <chr>            <chr>                
 1 cDC2                       myeloid     dendritic cells  cDC2                 
 2 plasmacytoid DC            myeloid     dendritic cells  plasmacytoid DC      
 3 mast cells                 myeloid     granulocytes     mast cells           
 4 B cells                    lymphoid    B cells          B cells              
 5 monocytes                  myeloid     monocytes        monocytes            
 6 cDC1                       myeloid     dendritic cells  cDC1                 
 7 ciliated epithelial cells  epithelial  epithelial cells ciliated epithelial …
 8 neutrophil-like            myeloid     granulocytes     neutrophil-like      
 9 secretory epithelial cells epithelial  epithelial cells secretory epithelial…
10 migratory DC               myeloid     dendritic cells  migratory DC         
# ℹ 34 more rows

Map long labels to short labels.

# mapping to get short labels
lab_map <- c(
  "DC cells"                  = "DC",
  "mast cells"                = "Mast",
  "B cells"                   = "B",
  "monocytes"                 = "Mono",
  "epithelial cells"          = "Epi",
  "neutrophils"               = "Neut",
  "dividing innate cells"     = "Div innate",
  "NK cells"                  = "NK",
  "CD4 T cells"               = "CD4 T",
  "CD8 T cells"               = "CD8 T",
  "innate lymphocyte"         = "ILC",
  "gamma delta T cells"       = "γδ T",
  "NK-T cells"                = "NKT",
  "proliferating T/NK"        = "Prolif T/NK",
  "macrophages"               = "Mac",
  "proliferating macrophages" = "Prolif Mac"
)

# map long labels to short labels
seuADT$short_labels <- lab_map[seuADT$ann_level_1]
# match ordering of the levels betwen long and short labels
lut <- unique(seuADT@meta.data[, c("ann_level_1", "short_labels")])
lut <- lut[match(levels(factor(seuADT$ann_level_1)), lut$ann_level_1), , drop = FALSE]
# update level ordering for short labels
seuADT$short_labels <- factor(seuADT$short_labels, levels = lut$short_labels)

Make data frame of proteins, clusters, expression levels. Examine distribution of expression for heatmap scaling.

# ADTs <- read_csv(file = here("data",
#                        "Proteins_broad_22.04.22.csv"))
# pattern <- "anti-human/mouse |anti-human/mouse/rat |anti-mouse/human |anti-human "
# ADTs$Description <- gsub(pattern, "", ADTs$Description)

labels <- readxl::read_excel(here("data/main_proteins.xlsx"))

unnest(enframe(setNames(str_split(labels$`main proteins`, ", "),
                        labels$`cell type`),
               value = "ADT",
               name = "cluster"),
       cols = ADT) %>%
  arrange(cluster) %>%
  distinct() -> markers

markers <- markers[markers$ADT %in% rownames(seuADT),]

seuADT@meta.data %>%
  dplyr::select(short_labels) %>%
  rownames_to_column(var = "cell") %>%
  inner_join(as.data.frame(t(seuADT[["ADT"]]@data)) %>%
               rownames_to_column(var = "cell")) %>%
  pivot_longer(c(-cell, -short_labels),
               names_to = "ADT",
               values_to = "Expression") %>%
  dplyr::group_by(short_labels, ADT) %>%
  dplyr::summarize(Expression = mean(Expression)) %>%
  ungroup() %>%
  dplyr::filter(ADT %in% markers$ADT) -> dat

plot(density(dat$Expression))

dat %>%
  dplyr::rename("Protein" = "ADT",
                "ADT Exp." = "Expression",
                "Cell type" = "short_labels") %>%
  tidyHeatmap::heatmap(
    .column = Protein,
    .row = `Cell type`,
    .value = `ADT Exp.`,
    scale = "none",
    rect_gp = grid::gpar(col = "white", lwd = 1),
    show_row_names = TRUE, 
    cluster_rows = FALSE,
    cluster_columns = FALSE,
    column_names_gp = grid::gpar(fontsize = 8, fontfamily = "arial"),
    column_title_gp = grid::gpar(fontsize = 10, fontfamily = "arial"),
    row_names_gp = grid::gpar(fontsize = 8, fontfamily = "arial"),
    row_title_gp = grid::gpar(fontsize = 10, fontfamily = "arial"),
    column_title_side = "top",
    palette_value = circlize::colorRamp2(seq(0, 2, length.out = 11),
                                            rev(RColorBrewer::brewer.pal(11, "RdYlBu"))),
    heatmap_legend_param = list(direction = "vertical")) %>%
  add_tile(`Cell type`, show_legend = FALSE,
           show_annotation_name = FALSE,
           palette = paletteer_d("miscpalettes::pastel", 
                                 length(unique(seuADT$ann_level_1)))) %>%
    as_ComplexHeatmap() -> f1e
f1e

Version Author Date
15e0d6f Jovana Maksimovic 2025-02-20
360908b Jovana Maksimovic 2025-02-17

UMAP of all cells

Split by batch for integration. Normalise with SCTransform. Increase the strength of alignment by increasing k.anchor parameter to 20 as recommended in Seurat Fast integration with RPCA vignette.

# clean up memory
gc()
            used   (Mb) gc trigger    (Mb) limit (Mb)   max used    (Mb)
Ncells  10952704  585.0   18090492   966.2         NA   18090492   966.2
Vcells 157042193 1198.2 3255902500 24840.6      65536 4390652805 33498.1
out <- here("data",
            "C133_Neeland_merged",
            glue("C133_Neeland_full_clean_integrated_all_cells.SEU.rds"))

if(!file.exists(out)){
  
  # load annotated cells for each cell type group
  files <- list.files(here("data/C133_Neeland_merged"),
                      pattern = "C133_Neeland_full_clean.*(macrophages|t_cells|other_cells)_annotated_diet.SEU.rds",
                      full.names = TRUE)
  seuLst <- lapply(files[2:4], function(f) readRDS(f))
  # merge into a single object
  seu <- merge(seuLst[[1]],
               y = c(seuLst[[2]],
                     seuLst[[3]]))
  rm(seuLst)
  gc()
  
  # Assign each cell a score, based on its expression of G2/M and S phase markers as described in the Seurat workflow
  # https://satijalab.org/seurat/articles/cell_cycle_vignette.html
  s.genes <- cc.genes.updated.2019$s.genes
  g2m.genes <- cc.genes.updated.2019$g2m.genes
  seu <- CellCycleScoring(seu, s.features = s.genes, g2m.features = g2m.genes, 
                          set.ident = TRUE)

  # Using the `Seurat` *Alternate Workflow* from (https://satijalab.org/seurat/articles/cell_cycle_vignette.html),
  # calculate the difference between the G2M and S phase scores so that signals separating non-cycling cells and cycling
  # cells will be maintained, but differences in cell cycle phase among proliferating cells (which are often
  # uninteresting), can be regressed out of the data.
  seu$CC.Difference <- seu$S.Score - seu$G2M.Score
  
  gns <- AnnotationDbi::select(org.Hs.eg.db,
                               keys = rownames(seu),
                               columns = c("CHR","ENTREZID"),
                               keytype = "SYMBOL",
                               multiVals = "first")
  
  m <- match(rownames(seu), gns$SYMBOL)
  gns <- gns[m,]
  # remove HLA, immunoglobulin, MT, RP, MRP and sex genes prior to integration
  var_regex = '^HLA-|^IG[HJKL]|^MT-|^RPL|^MRPL'
  keep <- !(str_detect(rownames(seu), var_regex) | gns$CHR %in% c("X","Y"))
  seu <- seu[keep,] 

  DefaultAssay(seu) <- "RNA"
  VariableFeatures(seu) <- NULL
  
  seuLst <- SplitObject(seu, split.by = "Batch")
  rm(seu)
  gc()
  
  # normalise with SCTransform and regress out cell cycle score difference
  seuLst <- lapply(X = seuLst, FUN = SCTransform, method = "glmGamPoi",
                   vars.to.regress = "CC.Difference")
  # integrate RNA data
  features <- SelectIntegrationFeatures(object.list = seuLst,
                                        nfeatures = 3000)
  seuLst <- PrepSCTIntegration(object.list = seuLst, anchor.features = features)
  seuLst <- lapply(X = seuLst, FUN = RunPCA, features = features)
  anchors <- FindIntegrationAnchors(object.list = seuLst,
                                    normalization.method = "SCT",
                                    anchor.features = features,
                                    dims = 1:30, reduction = "rpca")
  seu <- IntegrateData(anchorset = anchors, 
                       normalization.method = "SCT",
                       dims = 1:30)
  
  DefaultAssay(seu) <- "integrated"
  seu <- RunPCA(seu, dims = 1:30, verbose = FALSE) %>%
    RunUMAP(dims = 1:30, verbose = FALSE)

  saveRDS(seu, file = out)
  fs::file_chmod(out, "664")
  if(any(str_detect(fs::group_ids()$group_name, 
                    "oshlack_lab"))) fs::file_chown(out, 
                                                    group_id = "oshlack_lab")
} else {
  seu <- readRDS(file = out)
  
}

Map long cell type labels to short labels for RNA data.

# map long labels to short labels
seu$short_labels <- lab_map[seu$ann_level_1]
# match ordering of the levels betwen long and short labels
lut <- unique(seu@meta.data[, c("ann_level_1", "short_labels")])
lut <- lut[match(levels(factor(seu$ann_level_1)), lut$ann_level_1), , drop = FALSE]
# update level ordering for short labels
seu$short_labels <- factor(seu$short_labels, levels = lut$short_labels)
options(ggrepel.max.overlaps = Inf)
cluster_pal <- "miscpalettes::pastel"

DimPlot(seu, group.by = "Group")

FeaturePlot(seu, features = "Age") +
  scale_color_viridis_c(option = "magma", direction = -1)

DimPlot(seu, 
        group.by = "short_labels", label = F) +
  scale_color_paletteer_d(cluster_pal) +
  NoLegend() -> p1

LabelClusters(p1, id = "short_labels", repel = TRUE,
              size = 2.5, box = TRUE,
              fontface = "bold") +
  theme(axis.title = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        axis.line = element_blank(),
        plot.title = element_blank()) -> f1b

f1b

Cell proportions by sample

samp_map <-
c(
  "CF.IVA" = "CF (iva)",
  "CF.LUMA_IVA" = "CF (luma/iva)",
  "CF.NO_MOD" = "CF (no mod)",
  "NON_CF.CTRL" = "Non-CF control"
)

seu@meta.data %>%
  dplyr::select(sample.id, Group) %>%
  mutate(Group = samp_map[Group]) %>%
  count(sample.id, Group) %>% 
  ungroup() %>%
ggplot(aes(x = sample.id, y = n, fill = Group)) +
  geom_bar(stat = "identity", color = "black", size = 0.1) +
  theme_classic() +
  theme(axis.text.x = element_blank(),
        axis.title.x = element_blank(),
        axis.ticks.x = element_blank(),
        axis.line.x = element_blank(),
        strip.text = element_blank(),
        strip.background = element_blank(),
        plot.margin = unit(c(0, 0, 0, 0), "lines")) +
  labs(y = "Number of cells", fill = "Condition") +
  scale_fill_paletteer_d("RColorBrewer::Set2", direction = 1) +
  facet_grid(~Group, scales = "free_x", space = "free_x") +
  scale_y_continuous(expand = expansion(mult = c(0.01, 0.02))) -> p2

props <- getTransformedProps(clusters = seu$short_labels,
                             sample = seu$sample.id, transform="asin")

props$Proportions %>%
  data.frame %>%
  inner_join(seu@meta.data %>%
               dplyr::select(sample.id,
                             Group),
             by = c("sample" = "sample.id")) %>%
  distinct() %>%
ggplot(aes(x = sample, y = Freq, fill = clusters)) +
  geom_bar(stat = "identity", color = "black", size = 0.1) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 45,
                                   vjust = 1,
                                   hjust = 1,
                                   size = 8),
        strip.text = element_blank(),
        strip.background = element_blank(),
        plot.margin = unit(c(0, 0, 0, 0), "lines")) +
  labs(y = "Cell type proportion", fill = "Cell type", x = "Sample") +
  scale_fill_paletteer_d("miscpalettes::pastel", direction = 1) +
  facet_grid(~Group, scales = "free_x", space = "free_x") +
  scale_y_continuous(expand = expansion(mult = c(0.02, 0.03))) -> p3

(p2 / p3) + plot_layout(guides = "collect") &
  theme(legend.text = element_text(size = 8),
        legend.title = element_text(size = 10),
        legend.key.size = unit(0.8, "lines")) -> f1c

f1c

Seurat marker gene dotplot

DefaultAssay(seu) <- "RNA"
Idents(seu) <- "ann_level_1"

gns <- AnnotationDbi::select(org.Hs.eg.db,
                             keys = rownames(seu),
                             columns = c("CHR","ENTREZID"),
                             keytype = "SYMBOL",
                             multiVals = "first")
m <- match(rownames(seu), gns$SYMBOL)
gns <- gns[m,]

out <- here("data/cluster_annotations/seurat_markers_all_cells.rds")

if(!file.exists(out)){
  keep <- !is.na(gns$ENTREZID)
  markers <- FindAllMarkers(seu, only.pos = TRUE, logfc.threshold = 0.5,
                            features = rownames(seu)[rownames(seu) %in% gns$SYMBOL[keep]],
                            max.cells.per.ident = 10000)
  saveRDS(markers, file = out)

} else {
  markers <- readRDS(out)

}

# labels <- readxl::read_excel(here("data/main_marker_genes.xlsx"))
# 
# unnest(enframe(setNames(str_split(labels$`main marker genes`, ", "),
#                         labels$`cell type`),
#                value = "gene",
#                name = "cluster"),
#        cols = gene) %>%
#   arrange(cluster) %>%
#   distinct() -> markers

markers <- markers[markers$gene %in% rownames(seu),]
draw_marker_gene_dotplot(seu,
                         markers %>% mutate(cluster = as.character(cluster)),
                         ann_level = "ann_level_1",
                         cluster_pal,
                         lab_map = lab_map,
                         direction = 1,
                         num = 5) -> f1d

f1d

Version Author Date
15e0d6f Jovana Maksimovic 2025-02-20
360908b Jovana Maksimovic 2025-02-17

Cell type dendrogram

library(dplyr)
library(data.tree)
library(igraph)
library(ggraph)

# df has ann_level_1/2/3 (one row per cell)
paths <- (seu@meta.data %>% 
            dplyr::select(ann_level_3) %>%
            dplyr::rename(label = ann_level_3)) %>%
  left_join(hier_lut) %>%
  mutate(across(starts_with("ann_level_"), ~ tidyr::replace_na(.x, "unassigned"))) %>%
  count(ann_level_1, ann_level_2, ann_level_3, name = "cells") %>%
  mutate(pathString = paste("all cells", ann_level_1, ann_level_2, ann_level_3, sep = "/"))


# ----- 2) tree & propagate counts -----
tree <- as.Node(paths, pathName = "pathString")
tree$Do(function(node) {
  if (!node$isLeaf) node$cells <- sum(sapply(node$children, function(ch) ifelse(is.null(ch$cells), 0, ch$cells)))
}, traversal = "post-order")

# ----- 3) nodes/edges + lineage & clean labels -----
nodes <- ToDataFrameTree(tree, "levelName", "pathString", "cells", "level") %>%
  transmute(
    name       = pathString,  # unique vertex id (full path)
    short_name = str_squish(str_replace(levelName, "^[^A-Za-z]+", "")),  # strip only leading non-letters
    cells      = cells,
    depth      = level,
    lineage    = sub("^all cells/([^/]+).*", "\\1", name)  # ann_level_1 from path segment 2
  )

edges <- nodes %>%
  mutate(parent = ifelse(name == "all cells", NA_character_, dirname(name))) %>%
  filter(!is.na(parent)) %>%
  dplyr::select(from = parent, to = name)

g <- graph_from_data_frame(edges, vertices = nodes, directed = TRUE)

# mark leaves once (optional; ggraph usually provides `leaf` automatically for dendrograms)
V(g)$leaf <- igraph::degree(g, mode = "out") == 0

ggraph(g, layout = "dendrogram") +
  geom_edge_diagonal(color = "lightgrey") +
  geom_node_point(aes(size = cells, colour = lineage)) +
  geom_node_text(aes(label = short_name),
                 angle = 0, 
                 hjust = 0, 
                 size = 3, 
                 nudge_y = 0.05,
                 show.legend = FALSE) +
  scale_size_continuous(name = "No. cells", range = c(1, 6)) +
  # Root on left, leaves on right, with padding on BOTH sides
  coord_flip(clip = "off") +  # KEY: no clipping
  scale_y_reverse(expand = expansion(mult = c(0.02, 0.3))) +
  theme_void() +
  theme(plot.margin = margin(rep(0, 4))) +
  labs(colour = "Lineage") -> f1f

f1f

Figure 1

layout = "
BBBCCCCC
BBBCCCCC
BBBCCCCC
BBBCCCCC
DDDDDDDD
DDDDDDDD
DDDDDDDD
DDDDDDDD
FFFFGGGG
FFFFGGGG
FFFFGGGG
FFFFGGGG
FFFFGGGG
"
(wrap_elements(f1b + theme(plot.margin = unit(rep(0,4), "cm"))) + 
    wrap_elements(f1c + theme(plot.margin = unit(rep(0,4), "cm"))) + 
    wrap_elements(f1d + theme(plot.margin = unit(rep(0,4), "cm"))) + 
    wrap_plots(list(f1e %>% 
                      ComplexHeatmap::draw(heatmap_legend_side = "left") %>% 
                      grid::grid.grabExpr())) +
    wrap_elements(f1f + theme(plot.margin = unit(rep(0,4), "cm")))) + 
  plot_layout(design = layout) +
  plot_annotation(tag_levels = list(c("B","C","D","E","F"))) &
  theme(plot.tag = element_text(size = 24,
                                face = "bold",
                                family = "arial"))

Version Author Date
15e0d6f Jovana Maksimovic 2025-02-20
360908b Jovana Maksimovic 2025-02-17

Session info


sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Australia/Melbourne
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices datasets  utils     methods  
[8] base     

other attached packages:
 [1] igraph_2.0.1.1              data.tree_1.1.0            
 [3] readxl_1.4.3                ggh4x_0.3.1                
 [5] dsb_1.0.3                   paletteer_1.6.0            
 [7] tidyHeatmap_1.8.1           speckle_1.2.0              
 [9] glue_1.8.0                  org.Hs.eg.db_3.18.0        
[11] AnnotationDbi_1.64.1        patchwork_1.3.1            
[13] clustree_0.5.1              ggraph_2.2.0               
[15] here_1.0.1                  dittoSeq_1.14.2            
[17] glmGamPoi_1.14.3            SeuratObject_4.1.4         
[19] Seurat_4.4.0                lubridate_1.9.3            
[21] forcats_1.0.0               stringr_1.5.1              
[23] dplyr_1.1.4                 purrr_1.0.2                
[25] readr_2.1.5                 tidyr_1.3.1                
[27] tibble_3.2.1                ggplot2_3.5.2              
[29] tidyverse_2.0.0             edgeR_4.0.15               
[31] limma_3.58.1                SingleCellExperiment_1.24.0
[33] SummarizedExperiment_1.32.0 Biobase_2.62.0             
[35] GenomicRanges_1.54.1        GenomeInfoDb_1.38.6        
[37] IRanges_2.36.0              S4Vectors_0.40.2           
[39] BiocGenerics_0.48.1         MatrixGenerics_1.14.0      
[41] matrixStats_1.2.0           workflowr_1.7.1            

loaded via a namespace (and not attached):
  [1] fs_1.6.6                spatstat.sparse_3.0-3   bitops_1.0-7           
  [4] httr_1.4.7              RColorBrewer_1.1-3      doParallel_1.0.17      
  [7] tools_4.3.3             sctransform_0.4.1       utf8_1.2.4             
 [10] R6_2.5.1                lazyeval_0.2.2          uwot_0.1.16            
 [13] GetoptLong_1.0.5        withr_3.0.0             sp_2.1-3               
 [16] gridExtra_2.3           progressr_0.14.0        cli_3.6.5              
 [19] Cairo_1.6-2             spatstat.explore_3.2-6  labeling_0.4.3         
 [22] prismatic_1.1.1         sass_0.4.10             spatstat.data_3.0-4    
 [25] ggridges_0.5.6          pbapply_1.7-2           parallelly_1.37.0      
 [28] rstudioapi_0.15.0       RSQLite_2.3.5           generics_0.1.3         
 [31] shape_1.4.6             vroom_1.6.5             ica_1.0-3              
 [34] spatstat.random_3.2-2   dendextend_1.17.1       Matrix_1.6-5           
 [37] fansi_1.0.6             abind_1.4-5             lifecycle_1.0.4        
 [40] whisker_0.4.1           yaml_2.3.8              SparseArray_1.2.4      
 [43] Rtsne_0.17              grid_4.3.3              blob_1.2.4             
 [46] promises_1.2.1          crayon_1.5.2            miniUI_0.1.1.1         
 [49] lattice_0.22-5          cowplot_1.1.3           KEGGREST_1.42.0        
 [52] pillar_1.9.0            knitr_1.50              ComplexHeatmap_2.18.0  
 [55] rjson_0.2.21            future.apply_1.11.1     codetools_0.2-19       
 [58] leiden_0.4.3.1          getPass_0.2-4           data.table_1.15.0      
 [61] vctrs_0.6.5             png_0.1-8               cellranger_1.1.0       
 [64] gtable_0.3.6            rematch2_2.1.2          cachem_1.0.8           
 [67] xfun_0.52               S4Arrays_1.2.0          mime_0.12              
 [70] tidygraph_1.3.1         survival_3.5-8          pheatmap_1.0.12        
 [73] iterators_1.0.14        statmod_1.5.0           ellipsis_0.3.2         
 [76] fitdistrplus_1.1-11     ROCR_1.0-11             nlme_3.1-164           
 [79] bit64_4.0.5             RcppAnnoy_0.0.22        rprojroot_2.0.4        
 [82] bslib_0.6.1             irlba_2.3.5.1           KernSmooth_2.23-22     
 [85] colorspace_2.1-0        DBI_1.2.1               tidyselect_1.2.1       
 [88] processx_3.8.3          bit_4.0.5               compiler_4.3.3         
 [91] git2r_0.33.0            DelayedArray_0.28.0     plotly_4.10.4          
 [94] scales_1.3.0            lmtest_0.9-40           callr_3.7.3            
 [97] digest_0.6.34           goftest_1.2-3           spatstat.utils_3.0-4   
[100] rmarkdown_2.29          XVector_0.42.0          htmltools_0.5.8.1      
[103] pkgconfig_2.0.3         fastmap_1.1.1           rlang_1.1.6            
[106] GlobalOptions_0.1.2     htmlwidgets_1.6.4       shiny_1.8.0            
[109] farver_2.1.1            jquerylib_0.1.4         zoo_1.8-12             
[112] jsonlite_1.8.8          mclust_6.1              RCurl_1.98-1.14        
[115] magrittr_2.0.3          GenomeInfoDbData_1.2.11 munsell_0.5.0          
[118] Rcpp_1.0.12             viridis_0.6.5           reticulate_1.42.0      
[121] stringi_1.8.3           zlibbioc_1.48.0         MASS_7.3-60.0.1        
[124] plyr_1.8.9              parallel_4.3.3          listenv_0.9.1          
[127] ggrepel_0.9.5           deldir_2.0-2            Biostrings_2.70.2      
[130] graphlayouts_1.1.0      splines_4.3.3           tensor_1.5             
[133] hms_1.1.3               circlize_0.4.15         locfit_1.5-9.8         
[136] ps_1.7.6                spatstat.geom_3.2-8     reshape2_1.4.4         
[139] evaluate_0.23           renv_1.1.4              BiocManager_1.30.22    
[142] tzdb_0.4.0              foreach_1.5.2           tweenr_2.0.3           
[145] httpuv_1.6.14           RANN_2.6.1              polyclip_1.10-6        
[148] future_1.33.1           clue_0.3-65             scattermore_1.2        
[151] ggforce_0.4.2           xtable_1.8-4            later_1.3.2            
[154] viridisLite_0.4.2       memoise_2.0.1           cluster_2.1.6          
[157] timechange_0.3.0        globals_0.16.2