Last updated: 2021-04-21
Checks: 7 0
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it's best to always run the code in an empty environment.
set.seed(20200302) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version e3c4003. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use
wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files: Ignored: .DS_Store Ignored: .Rhistory Ignored: .Rproj.user/ Ignored: analysis/figures.nb.html Ignored: code/.DS_Store Ignored: code/.Rhistory Ignored: code/.job/ Ignored: code/old/ Ignored: data/.DS_Store Ignored: data/annotations/ Ignored: data/cache-intermediates/ Ignored: data/cache-region/ Ignored: data/cache-rnaseq/ Ignored: data/cache-runtime/ Ignored: data/datasets/.DS_Store Ignored: data/datasets/GSE110554-data.RData Ignored: data/datasets/GSE120854/ Ignored: data/datasets/GSE120854_RAW.tar Ignored: data/datasets/GSE135446-data.RData Ignored: data/datasets/GSE135446/ Ignored: data/datasets/GSE135446_RAW.tar Ignored: data/datasets/GSE45459-data.RData Ignored: data/datasets/GSE45459_Matrix_signal_intensities.txt Ignored: data/datasets/GSE45460/ Ignored: data/datasets/GSE45460_RAW.tar Ignored: data/datasets/GSE95460_RAW.tar Ignored: data/datasets/GSE95460_RAW/ Ignored: data/datasets/GSE95462-data.RData Ignored: data/datasets/GSE95462/ Ignored: data/datasets/GSE95462_RAW/ Ignored: data/datasets/SRP100803/ Ignored: data/datasets/SRP125125/.DS_Store Ignored: data/datasets/SRP125125/SRR6298*/ Ignored: data/datasets/SRP125125/SRR_Acc_List.txt Ignored: data/datasets/SRP125125/SRR_Acc_List_Full.txt Ignored: data/datasets/SRP125125/SraRunTable.txt Ignored: data/datasets/SRP125125/multiqc_data/ Ignored: data/datasets/SRP125125/multiqc_report.html Ignored: data/datasets/SRP125125/quants/ Ignored: data/datasets/SRP166862/ Ignored: data/datasets/SRP217468/ Ignored: data/datasets/TCGA.BRCA.rds Ignored: data/datasets/TCGA.KIRC.rds Ignored: data/misc/ Ignored: output/--exclude Ignored: output/.DS_Store Ignored: output/FDR-analysis/ Ignored: output/compare-methods/ Ignored: output/figures/ Ignored: output/methylgsa-params/ Ignored: output/outputs-1.tar.gz Ignored: output/outputs.tar.gz Ignored: output/random-cpg-sims/ Untracked files: Untracked: analysis/old/
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (
analysis/gettingStarted.Rmd) and HTML (
docs/gettingStarted.html) files. If you've configured a remote Git repository (see
?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.
|Rmd||9d23572||JovMaksimovic||2021-04-20||wflow_publish(rownames(stat\(status)[stat\)status$modified == TRUE])|
|Rmd||91699a8||JovMaksimovic||2020-08-14||wflow_publish("analysis/_site.yml", republish = TRUE, all = TRUE)|
This page describes how to download the data and code used in this analysis, set up the project directory and rerun the analysis. We have use the
workflowr package to organise the analysis and insert reproducibilty information into the output documents.
All the code and outputs of analysis are available from GitHub at https://github.com/Oshlack/methyl-geneset-testing. If you want to replicate the analysis you can either fork the repository and clone it or download the repository as a zipped directory.
Once you have a local copy of the repository you should see the following directory structure:
analysis/- Contains the RMarkdown documents with the various stages of analysis. These are numbered according to the order they should be run.
data/- This directory contains the data files used in the analysis with sub-directories for different data types (see Getting the data for details). Processed intermediate data files will also be placed here.
output/- Directory for output files produced by the analysis, each analysis step has it's own sub-directory.
docs/- This directory contains the analysis website hosted at http://oshlacklab.com/methyl-geneset-testing, including image files.
code/- R scripts with custom functions used in some analysis stages. There are sub-directories for scripts associated with different steps in the anlaysis.
README.md- README describing the project.
.Rprofile- Custom R profile for the project including set up for
.gitignore- Details of files and directories that are excluded from the repository.
_workflowr.yml- Workflowr configuration file.
methyl-geneset-testing.Rproj- RStudio project file.
In this project we have used data from several publicly avilable datasets. Flow-sorted, blood cell methylation data generated using Illumina HumanMethylationEPIC arrays. Normal kidney methylation data from The Cancer Genome Atlas (TCGA) kidney clear-cell carcinoma (KIRC) cohort, which was generated using Illumina HumanMethylation450 arrays. These are both automatically downloaded as part of the analysis directly from the Bioconductor ExperimentHub.
Once the RNAseq data has been downloaded it needs to be extracted, placed in the correct directories and quasi-mapped and quantified using Salmon. The approach we took is described here. The downstream analysis code assumes the following directory structure inside the
We use pre B-cell development Affymetrix gene expression array data, which can be downloaded from GEO at GSE45460.
For downstream analysis the CEL files for each sample are expected to be present in the following directory structure:
We use publicly available 450K data generated from developing human B-cells, which can be downloaded from: GSE45459 . Specifically, the
GSE45459_Matrix_signal_intensities.txt.gz file should be downloaded and placed in the
data\datasets directory and unzipped using
Some additional data files used during the analysis are provided as part of the repository.
Intermediate data files created during the analysis will be placed in:
These are used by later stages of the analysis so should not be moved, altered or deleted.
The analysis directory contains the following analysis files:
 "01_exploreArrayBiasEPIC.Rmd" "02_exploreArrayBias450.Rmd"  "03_fdrAnalysisBRCA.Rmd" "03_fdrAnalysisKIRC.Rmd"  "04_expressionGenesets.Rmd" "04_expressionGenesetsBcells.Rmd"  "05_compareMethods.Rmd" "05_compareMethodsBcells.Rmd"  "06_runTimeComparison.Rmd" "07_regionAnalysis.Rmd"  "07_regionAnalysisBcells.Rmd" "08_methylGSAParamSweep.Rmd"
As indicated by the numbering they should be run in this order. If you want to rerun the entire analysis this can be easily done using
workflowr::wflow_build(republish = TRUE)
It is important to consider the computer and environment you are using before doing this. Running this analysis from scratch requires a considerable amount of time, disk space and memory. Some stages of the analysis need to be executed on a HPC to generate results required by downstream steps. If you do no have access to a HPC to perform these analyses using the code provided, you can download pre-computed RDS files containing the results from .
To use the pre-computed RDS objects, after cloning or downloading the GitHub repository to your computer, please extract the
outputs.tar.gz archive under the
output directory, using
tar -xvf outputs.tar.gz.
It is also possible to run individual stages of the analysis, either by providing the names of the file you want to run to
workflowr::wflow_build() or by manually knitting the document (for example using the 'Kint' button in RStudio).
Once all the analyses have been rerun, the manuscript figures can be generated using the code provided here.
R version 4.0.3 (2020-10-10) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Mojave 10.14.6 Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib locale:  en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8 attached base packages:  stats graphics grDevices utils datasets methods base other attached packages:  workflowr_1.6.2 loaded via a namespace (and not attached):  Rcpp_1.0.6 whisker_0.4 knitr_1.31 magrittr_2.0.1  here_1.0.1 R6_2.5.0 rlang_0.4.10 fansi_0.4.2  stringr_1.4.0 tools_4.0.3 xfun_0.22 utf8_1.2.1  git2r_0.28.0 jquerylib_0.1.3 htmltools_0.5.1.1 ellipsis_0.3.1  rprojroot_2.0.2 yaml_2.2.1 digest_0.6.27 tibble_3.1.0  lifecycle_1.0.0 crayon_1.4.1 later_126.96.36.199 sass_0.3.1  vctrs_0.3.7 promises_188.8.131.52 fs_1.5.0 glue_1.4.2  evaluate_0.14 rmarkdown_2.7 stringi_1.5.3 bslib_0.2.4  compiler_4.0.3 pillar_1.5.1 jsonlite_1.7.2 httpuv_1.5.5  pkgconfig_2.0.3