【TCGA数据挖掘】R/mclust做Model-based clustering分析整合临床分类
在常见的临床生信文章中,需要进行Model-based clustering来将临床数据与分子数据结合,以下就是使用mclust实现的过程
加载所需要的包
library(mclust)
library(Biobase)
library(ConsensusClusterPlus)
library(BS831)
library(cba)
library(pheatmap)
对数据进行聚类
MC <- Mclust(t(Biobase::exprs(cancerSet)),G=1:4)
summary(MC)
## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm
## ----------------------------------------------------
##
## Mclust VEI (diagonal, equal shape) model with 3 components:
##
## log-likelihood n df BIC ICL
## -200111.4 45 16004 -461144.6 -461144.6
##
## Clustering table:
## 1 2 3
## 18 20 7
由以上可以看到分成了三类,那么这三类在临床样本的分布是怎样的呢?
## show cluster assignments (and disease state)
head(data.frame(MC$classification,pData(cancerSet)$Characteristics.DiseaseState))
## MC.classification pData.cancerSet..Characteristics.DiseaseState
## GSM85480 1 sporadic basal-like
## GSM85490 1 sporadic basal-like
## GSM85500 2 non-basal-like
## GSM85484 1 sporadic basal-like
## GSM85478 1 sporadic basal-like
## GSM85516 3 normal
以上我们只是对样本查看实际上还是要对整个的分布,比如1,2,3聚类在临床表型的差别进行统计。
## contingency table
print(ftable(MC$classification,pData(cancerSet)$Characteristics.DiseaseState))
## non-basal-like normal sporadic basal-like
##
## 1 0 0 18
## 2 20 0 0
## 3 0 7 0
得到这些数据就可以用pheatmap,或者complexheatmap进行绘制图片了,大致如下:

如何你不会的话,后续我们会整理一篇发表的文章来精细表示!!!欢迎扫码入群一起实时交流!
请关注“恒诺新知”微信公众号,感谢“R语言“,”数据那些事儿“,”老俊俊的生信笔记“,”冷🈚️思“,“珞珈R”,“生信星球”的支持!