TCGAbiolinks-Case study n. 1: Pan Cancer downstream analysis BRCA
开始接触TCGA数据,想学习如何下载、处理、分析这些数据。在目前的常用分析包中选中了R包TCGAbiolinks。 下面就记录过程,实际上本身TCGAbiolinks的官方教程就比较完整,我就按照官方教程学习如何处理然后整合weinfo提供的docker封装环境进行的分析。
以下我们针对乳腺癌正常和配对肿瘤的标本的表达谱的数据进行分析
library(openxlsx)
data <- read.xlsx("barcode.xlsx")
cc <- as.character(data$cases)
# 1
library(SummarizedExperiment)
library(TCGAbiolinks)
query.exp <- GDCquery(project = "TCGA-BRCA",
legacy = TRUE,
data.category = "Gene expression",
data.type = "Gene expression quantification",
platform = "Illumina HiSeq",
file.type = "results",
experimental.strategy = "RNA-Seq",
barcode = cc)
GDCdownload(query.exp)
brca.exp <- GDCprepare(query = query.exp, save = TRUE, save.filename = "brcaExp.rda")
# get subtype information
dataSubt <- TCGAquery_subtype(tumor = "BRCA")
# get clinical data
dataClin <- GDCquery_clinic(project = "TCGA-BRCA","clinical")
names(dataClin)[16] <- "days_to_last_followup"
names(dataClin)[8] <- "age_at_initial_pathologic_diagnosis"
# Which samples are primary solid tumor
dataSmTP <- TCGAquery_SampleTypes(getResults(query.exp,cols="cases"),"TP")
# which samples are solid tissue normal
dataSmNT <- TCGAquery_SampleTypes(getResults(query.exp,cols="cases"),"NT")
# 2
dataPrep <- TCGAanalyze_Preprocessing(object = brca.exp, cor.cut = 0.6)
dataNorm <- TCGAanalyze_Normalization(tabDF = dataPrep,
geneInfo = geneInfo,
method = "gcContent")
dataFilt <- TCGAanalyze_Filtering(tabDF = dataNorm,
method = "quantile",
qnt.cut = 0.25)
dataDEGs <- TCGAanalyze_DEA(mat1 = dataFilt[,dataSmNT],
mat2 = dataFilt[,dataSmTP],
Cond1type = "Normal",
Cond2type = "Tumor",
fdr.cut = 0.01 ,
logFC.cut = 1,
method = "glmLRT")
# 3
ansEA <- TCGAanalyze_EAcomplete(TFname="DEA genes Normal Vs Tumor",
RegulonList = rownames(dataDEGs))
TCGAvisualize_EAbarplot(tf = rownames(ansEA$ResBP),
GOBPTab = ansEA$ResBP,
GOCCTab = ansEA$ResCC,
GOMFTab = ansEA$ResMF,
PathTab = ansEA$ResPat,
nRGTab = rownames(dataDEGs),
nBar = 20)
# 4
group1 <- TCGAquery_SampleTypes(colnames(dataFilt), typesample = c("NT"))
group2 <- TCGAquery_SampleTypes(colnames(dataFilt), typesample = c("TP"))
dataSurv <- TCGAanalyze_SurvivalKM(clinical_patient = dataClin,
dataGE = dataFilt,
Genelist = rownames(dataDEGs),
Survresult = FALSE,
ThreshTop = 0.67,
ThreshDown = 0.33,
p.cut = 0.05, group1, group2)
# 5
require(dnet) # to change
org.Hs.string <- dRDataLoader(RData = "org.Hs.string")
TabCoxNet <- TCGAvisualize_SurvivalCoxNET(dataClin,
dataFilt,
Genelist = rownames(dataSurv),
scoreConfidence = 100,
org.Hs.string = org.Hs.string,
titlePlot = "Case Study n.1 dnet")
请关注“恒诺新知”微信公众号,感谢“R语言“,”数据那些事儿“,”老俊俊的生信笔记“,”冷🈚️思“,“珞珈R”,“生信星球”的支持!