Last updated: 2021-04-21
Checks: 7 0
Knit directory: methyl-geneset-testing/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it's best to always run the code in an empty environment.
The command set.seed(20200302)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version e3c4003. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/figures.nb.html
Ignored: code/.DS_Store
Ignored: code/.Rhistory
Ignored: code/.job/
Ignored: code/old/
Ignored: data/.DS_Store
Ignored: data/annotations/
Ignored: data/cache-intermediates/
Ignored: data/cache-region/
Ignored: data/cache-rnaseq/
Ignored: data/cache-runtime/
Ignored: data/datasets/.DS_Store
Ignored: data/datasets/GSE110554-data.RData
Ignored: data/datasets/GSE120854/
Ignored: data/datasets/GSE120854_RAW.tar
Ignored: data/datasets/GSE135446-data.RData
Ignored: data/datasets/GSE135446/
Ignored: data/datasets/GSE135446_RAW.tar
Ignored: data/datasets/GSE45459-data.RData
Ignored: data/datasets/GSE45459_Matrix_signal_intensities.txt
Ignored: data/datasets/GSE45460/
Ignored: data/datasets/GSE45460_RAW.tar
Ignored: data/datasets/GSE95460_RAW.tar
Ignored: data/datasets/GSE95460_RAW/
Ignored: data/datasets/GSE95462-data.RData
Ignored: data/datasets/GSE95462/
Ignored: data/datasets/GSE95462_RAW/
Ignored: data/datasets/SRP100803/
Ignored: data/datasets/SRP125125/.DS_Store
Ignored: data/datasets/SRP125125/SRR6298*/
Ignored: data/datasets/SRP125125/SRR_Acc_List.txt
Ignored: data/datasets/SRP125125/SRR_Acc_List_Full.txt
Ignored: data/datasets/SRP125125/SraRunTable.txt
Ignored: data/datasets/SRP125125/multiqc_data/
Ignored: data/datasets/SRP125125/multiqc_report.html
Ignored: data/datasets/SRP125125/quants/
Ignored: data/datasets/SRP166862/
Ignored: data/datasets/SRP217468/
Ignored: data/datasets/TCGA.BRCA.rds
Ignored: data/datasets/TCGA.KIRC.rds
Ignored: data/misc/
Ignored: output/--exclude
Ignored: output/.DS_Store
Ignored: output/FDR-analysis/
Ignored: output/compare-methods/
Ignored: output/figures/
Ignored: output/methylgsa-params/
Ignored: output/outputs-1.tar.gz
Ignored: output/outputs.tar.gz
Ignored: output/random-cpg-sims/
Untracked files:
Untracked: analysis/old/
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made to the R Markdown (analysis/gettingStarted.Rmd
) and HTML (docs/gettingStarted.html
) files. If you've configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | e3c4003 | JovMaksimovic | 2021-04-21 | wflow_publish("analysis/gettingStarted.Rmd") |
html | 0f8d628 | JovMaksimovic | 2021-04-20 | Build site. |
Rmd | 9d23572 | JovMaksimovic | 2021-04-20 | wflow_publish(rownames(stat\(status)[stat\)status$modified == TRUE]) |
html | d3675c5 | JovMaksimovic | 2020-08-28 | Build site. |
Rmd | 562f140 | JovMaksimovic | 2020-08-28 | wflow_publish(c("analysis/03_expressionGenesets.Rmd", "analysis/gettingStarted.Rmd", |
html | e66c39a | JovMaksimovic | 2020-08-14 | Build site. |
Rmd | ad9e7be | JovMaksimovic | 2020-08-14 | wflow_publish(c("analysis/figures.Rmd", "analysis/gettingStarted.Rmd", |
html | 555069b | JovMaksimovic | 2020-08-14 | Build site. |
Rmd | 91699a8 | JovMaksimovic | 2020-08-14 | wflow_publish("analysis/_site.yml", republish = TRUE, all = TRUE) |
html | e162725 | JovMaksimovic | 2020-07-27 | Build site. |
Rmd | ea6f88d | JovMaksimovic | 2020-07-27 | wflow_publish(c("analysis/index.Rmd", "analysis/gettingStarted.Rmd")) |
html | d439b32 | JovMaksimovic | 2020-07-27 | Build site. |
Rmd | 6278674 | JovMaksimovic | 2020-07-27 | wflow_publish(c("analysis/index.Rmd", "analysis/gettingStarted.Rmd")) |
This page describes how to download the data and code used in this analysis, set up the project directory and rerun the analysis. We have use the workflowr
package to organise the analysis and insert reproducibilty information into the output documents.
All the code and outputs of analysis are available from GitHub at https://github.com/Oshlack/methyl-geneset-testing. If you want to replicate the analysis you can either fork the repository and clone it or download the repository as a zipped directory.
Once you have a local copy of the repository you should see the following directory structure:
analysis/
- Contains the RMarkdown documents with the various stages of analysis. These are numbered according to the order they should be run.data/
- This directory contains the data files used in the analysis with sub-directories for different data types (see Getting the data for details). Processed intermediate data files will also be placed here.output/
- Directory for output files produced by the analysis, each analysis step has it's own sub-directory.docs/
- This directory contains the analysis website hosted at http://oshlacklab.com/methyl-geneset-testing, including image files.code/
- R scripts with custom functions used in some analysis stages. There are sub-directories for scripts associated with different steps in the anlaysis.README.md
- README describing the project..Rprofile
- Custom R profile for the project including set up for workflowr
..gitignore
- Details of files and directories that are excluded from the repository._workflowr.yml
- Workflowr configuration file.methyl-geneset-testing.Rproj
- RStudio project file.In this project we have used data from several publicly avilable datasets. Flow-sorted, blood cell methylation data generated using Illumina HumanMethylationEPIC arrays. Normal kidney methylation data from The Cancer Genome Atlas (TCGA) kidney clear-cell carcinoma (KIRC) cohort, which was generated using Illumina HumanMethylation450 arrays. These are both automatically downloaded as part of the analysis directly from the Bioconductor ExperimentHub.
We use a flow-sorted, blood cell RNAseq dataset, which can be downloaded from GEO at GSE107011 or SRA at SRP125125.
Once the RNAseq data has been downloaded it needs to be extracted, placed in the correct directories and quasi-mapped and quantified using Salmon. The approach we took is described here. The downstream analysis code assumes the following directory structure inside the data/
directory:
datasets
SRP125125
quants
SRR6298258_quant
SRR6298376_quant
We use pre B-cell development Affymetrix gene expression array data, which can be downloaded from GEO at GSE45460.
For downstream analysis the CEL files for each sample are expected to be present in the following directory structure:
data
datasets
GSE45460
We use publicly available 450K data generated from developing human B-cells, which can be downloaded from: GSE45459 . Specifically, the GSE45459_Matrix_signal_intensities.txt.gz
file should be downloaded and placed in the data\datasets
directory and unzipped using gunzip GSE45459_Matrix_signal_intensities.txt.gz
.
Some additional data files used during the analysis are provided as part of the repository.
genesets
GO-immune-system-process.txt
kegg-immune-related-pathways.csv
datasets
SRP125125
SraRunTableFull.txt
Intermediate data files created during the analysis will be placed in:
annotations
cache-intermediates
cache-region
cache-rnaseq
cache-runtime
These are used by later stages of the analysis so should not be moved, altered or deleted.
The analysis directory contains the following analysis files:
[1] "01_exploreArrayBiasEPIC.Rmd" "02_exploreArrayBias450.Rmd"
[3] "03_fdrAnalysisBRCA.Rmd" "03_fdrAnalysisKIRC.Rmd"
[5] "04_expressionGenesets.Rmd" "04_expressionGenesetsBcells.Rmd"
[7] "05_compareMethods.Rmd" "05_compareMethodsBcells.Rmd"
[9] "06_runTimeComparison.Rmd" "07_regionAnalysis.Rmd"
[11] "07_regionAnalysisBcells.Rmd" "08_methylGSAParamSweep.Rmd"
As indicated by the numbering they should be run in this order. If you want to rerun the entire analysis this can be easily done using workflowr
.
workflowr::wflow_build(republish = TRUE)
It is important to consider the computer and environment you are using before doing this. Running this analysis from scratch requires a considerable amount of time, disk space and memory. Some stages of the analysis need to be executed on a HPC to generate results required by downstream steps. If you do no have access to a HPC to perform these analyses using the code provided, you can download pre-computed RDS files containing the results from .
To use the pre-computed RDS objects, after cloning or downloading the GitHub repository to your computer, please extract the outputs.tar.gz
archive under the output
directory, using tar -xvf outputs.tar.gz
.
It is also possible to run individual stages of the analysis, either by providing the names of the file you want to run to workflowr::wflow_build()
or by manually knitting the document (for example using the 'Kint' button in RStudio).
Once all the analyses have been rerun, the manuscript figures can be generated using the code provided here.
devtools::session_info()
sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] workflowr_1.6.2
loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 whisker_0.4 knitr_1.31 magrittr_2.0.1
[5] here_1.0.1 R6_2.5.0 rlang_0.4.10 fansi_0.4.2
[9] stringr_1.4.0 tools_4.0.3 xfun_0.22 utf8_1.2.1
[13] git2r_0.28.0 jquerylib_0.1.3 htmltools_0.5.1.1 ellipsis_0.3.1
[17] rprojroot_2.0.2 yaml_2.2.1 digest_0.6.27 tibble_3.1.0
[21] lifecycle_1.0.0 crayon_1.4.1 later_1.1.0.1 sass_0.3.1
[25] vctrs_0.3.7 promises_1.2.0.1 fs_1.5.0 glue_1.4.2
[29] evaluate_0.14 rmarkdown_2.7 stringi_1.5.3 bslib_0.2.4
[33] compiler_4.0.3 pillar_1.5.1 jsonlite_1.7.2 httpuv_1.5.5
[37] pkgconfig_2.0.3