tcgaBrca

The Johan Staaf-lab repository containing scripts for downloading and preprocessing TCGA BRCA data from the Genomic Data Commons repository.

Description

The pipeline uses the included manifest and annotation files to perform automated download and processing of gene expression, DNA methylation, copy-number, and somatic mutation data for all available tumor samples. A core data set with available data on all levels is produced and filtered against TCGA sample blacklists. The scripts produce a final data set comprising 630 unique breast cancer cases.

Data types

RNA-seq

Downloaded "as is". Avaliable as counts, fpkm and upper-quantile normalized data.

SNP6-Copy number data

Downloaded "as is" with segmentation data on gene- and genome level.

DNA methylation

Downloads raw Illumina iDat-files which are subsequently normlaized using the minfi package followed by Infinium I/II scaling. Poorly perfomring probes are filtered using a CpG blacklist by Zhou et al.. Annotations for Illumina Infinium CpGs are generated using custom scripts. Illumina beta value correction for sample purity is performed using a custom method as described in Staaf & Aine 2022.

Whole exome sequencing

Downloaded "as is" for the MUSE, SomaticSniper, Mutect, and varscan pipelines. Mutations called using at least two independent methods were kept in a filtering step, resulting in 58 973 SNVs across 630 samples.

Usage

The scripts contained perform data download, preprocessing and annotation. Processed R-objects are also made available on request as running the full set of scripts can be time consuming and have not been validated on all platforms.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
gdcTransferTool		gdcTransferTool
manifest		manifest
README.md		README.md
function_correctBetas.r		function_correctBetas.r
script0_prepareWorkspace.r		script0_prepareWorkspace.r
script1_processManifest.r		script1_processManifest.r
script2_getCoreData.r		script2_getCoreData.r
script3_mergeCoreData.r		script3_mergeCoreData.r
script4_normalizeCoreMethylation.r		script4_normalizeCoreMethylation.r
script5_filterBlacklistedSamples.r		script5_filterBlacklistedSamples.r
script6_adjustBetas.r		script6_adjustBetas.r
script7_annotateFeatures.r		script7_annotateFeatures.r
script8_createFinalWorkspace.r		script8_createFinalWorkspace.r

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tcgaBrca

Description

Data types

RNA-seq

SNP6-Copy number data

DNA methylation

Whole exome sequencing

Usage

About

Uh oh!

Releases

Packages

Languages

StaafLab/tcgaBrca

Folders and files

Latest commit

History

Repository files navigation

tcgaBrca

Description

Data types

RNA-seq

SNP6-Copy number data

DNA methylation

Whole exome sequencing

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages