WGCNA 加权基因共表达网络分析教程(1)
学是很多做科研的同学都想学的,包括我在内,现阶段正在学习这个,深夜整理材料不易,请多多关照支持!
先安装Rstudio以及以及加载WGCNA数据包
Rstudio软件下载地址:
https://www.rstudio.com/products/rstudio/download/
WGCNA加载按照网站步骤操作:
https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/
范例CSV文件下载地址:链接: https://pan.baidu.com/s/1bTS30Vr080RXjJuj0-3cEg 密码: pcs4
Workshop可以在rstudio菜单栏的session-set working directory 选择你要工作的文件夹,要读取的CSV等文档都只能放在设置的working directory。
蓝色#表示注释行,黑色是代码,运行代码时的每一行命令是不需要分号“;”。
以下是代码数据,这个是我看学习的教程,有些我也不懂,粘贴出来一起学习。
1.a Loading expression data
# Display the current working directory
getwd();
# If necessary, change the path below to the directory where the data files are stored.
# "." means current directory. On Windows use a forward slash / instead of the usual \.workingDir = ".";setwd(workingDir);
# Load the WGCNA package
library(WGCNA);
# The following setting is important, do not omit.
options(stringsAsFactors = FALSE);
#Read in the female liver data set
femData = read.csv("LiverFemale3600.csv");
# Take a quick look at what is in the data set:
dim(femData);
names(femData);
#The expression data set contains 135 samples. Note that each row corresponds to a gene and column to a sample or auxiliary information.(创建原始表达数据集datExpr0)
datExpr0 = as.data.frame(t(femData[, -c(1:8)]));#去掉femData数据集中第一列至第八列后的数据集
names(datExpr0) = femData$substanceBXH;
rownames(datExpr0) = names(femData)[-c(1:8)];
1.b Checking data for excessive missing values and identification of outlier microarray samples
gsg = goodSamplesGenes(datExpr0, verbose = 3);
gsg$allOK
#If the last statement (gsg$allOK)returns TRUE, all genes have passed the cuts. If not, we remove the o↵ending genes and samples from the data:
if (!gsg$allOK)
{
# Optionally, print the gene and sample names that were removed:
if (sum(!gsg$goodGenes)>0)
printFlush(paste("Removing genes:", paste(names(datExpr0)[!gsg$goodGenes], collapse = ", ")));
if (sum(!gsg$goodSamples)>0)
printFlush(paste("Removing samples:", paste(rownames(datExpr0)[!gsg$goodSamples], collapse = ", ")));
# Remove the offending genes and samples from the data:
datExpr0 = datExpr0[gsg$goodSamples, gsg$goodGenes]
}
#Next we cluster the samples (in contrast to clustering genes that will come later) to see if there are any obvious outliers.(样本聚类筛查异常值,异常样本sample F2_221, see Fig. 1)
)
sampleTree = hclust(dist(datExpr0), method = "average");# Plot the sample tree: Open a graphic output window of size 12 by 9 inches
# The user should change the dimensions if the window is too large or too small.
sizeGrWindow(12,9)#pdf(file = "Plots/sampleClustering.pdf", width = 12, height = 9);
par(cex = 0.6);
par(mar = c(0,4,2,0))
plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="", cex.lab = 1.5, cex.axis = 1.5, cex.main = 2)
# Plot a line to show the cut
abline(h = 15, col = "red");# Determine cluster under the lineclust = cutreeStatic(sampleTree, cutHeight = 15, minSize = 10)table(clust)# clust 1 contains the samples we want to keep.
keepSamples = (clust==1)
datExpr = datExpr0[keepSamples, ]
nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
#The variable datExpr now contains the expression data ready for network analysis.
1.c Loading clinical trait data(加载临床处理数据)
We now read in the trait data and match the samples for which they were measured to the expression samples.
traitData = read.csv("ClinicalTraits.csv");
dim(traitData)
names(traitData)
# remove columns that hold information we do not need.
allTraits = traitData[, -c(31, 16)];
allTraits = allTraits[, c(2, 11:36) ];
dim(allTraits)
names(allTraits)
# Form a data frame analogous to expression data that will hold the clinical traits.
femaleSamples = rownames(datExpr);
traitRows = match(femaleSamples, allTraits$Mice);
datTraits = allTraits[traitRows, -1];
rownames(datTraits) = allTraits[traitRows, 1];
collectGarbage();
#We now have the expression data in the variable datExpr, and the corresponding clinical traits in the variable datTraits. Before we continue with network construction and module detection, we visualize how the clinical traits relate to the sample dendrogram.(将表达数据集datExpr和datTraits作聚类图,Fig.2)
# Re-cluster samples
sampleTree2 = hclust(dist(datExpr), method = "average")# Convert traits to a color representation: white means low, red means high, grey means missing entry
traitColors = numbers2colors(datTraits, signed = FALSE);# Plot the sample dendrogram and the colors underneath.
plotDendroAndColors(sampleTree2, traitColors,
groupLabels = names(datTraits),
main = "Sample dendrogram and trait heatmap")
#将datExpr和datTraits数据集保存至FemaleLiver-01-dataInput.RData,下次继续使用数据集
save(datExpr, datTraits, file = "FemaleLiver-01-dataInput.RData")
贴出来主要是为了相互交流学习
请关注“恒诺新知”微信公众号,感谢“R语言“,”数据那些事儿“,”老俊俊的生信笔记“,”冷🈚️思“,“珞珈R”,“生信星球”的支持!