教程:Gene Set Enrichment Analysis
获取代码请点击:http://weinformatics.com:8084/root/003
入门级
本教程的原始帖子可以在我的 GitHub上找到。 有关我用于使此线程可用的引用列表,请参阅页面的最终结束。
介绍
假设我们已经进行了RNA-SEQ(或微阵列基因表达)实验,现在想知道哪些途径/生物学过程显示我们的[差异表达]基因的富集。 有些术语可能需要在彻底潜入主题之前熟悉::
基因集
基因组是无序的基因集合,其功能相关。
通路
通过忽略基因之间的功能关系,可以将途径解释为基因。
Gene Ontology (GO)
去描述基因功能。 基因角色/功能可归因于三个主要类:
• Molecular Function (MF) : 它定义了基因产物的分子活性。
• Cellular Component (CC) : 这描述了基因产物的活性/局部化。
• Biological Process (BP) : 描述基因产物活跃的途径和过程。
Kyoto Encyclopedia of Genes and Genomes (KEGG)
Kegg是一个集合的代表分子相互作用和反应网络的手动策划路径图。
有不同的方法广泛用于功能性富集分析:
1- Over Representation Analysis (ORA): This is the simplest version of enrichment analysis and at the same time the most widely used approach. The concept in this approach is based on a Fisher exact test p-value in a contingency table. There is a relatively large number of web-tools R package for ORA. Personally, I am a fan of DAVID web tools however its last update was in 2016 (DAVID 6.8 Oct. 2016).
update It is happening to me to have a list of genes and want to know what are common GO terms (usually BP) for these genes regardless of statistical significance. The package clusterProfiler
provides a function groupGO
that can be used to answer these kinds of questions. I have added codes for this analysis to the GitHub repository.
2- Gene Set Enrichment Analysis (GSEA):
It was developed by Broad Institute. This is the preferred method when genes are coming from an expression experiment like microarray and RNA-seq. However, the original methodology was designed to work on microarray but later modification made it suitable for RNA-seq also. In this approach, you need to rank your genes based on a statistic (like what DESeq2
provides, Wald statistic), and then perform enrichment analysis against different pathways (= gene set). You have to download the gene set files into your local system. The point is that here the algorithm will use all genes in the ranked list for enrichment analysis. [in contrast to ORA where only genes passed a specific threshold (like DE ones) would be used for enrichment analysis]. You can find more details about the methodology on the original PNAS paper.. To download these gene sets in a folder go to the MSigDB website, register, and download the data.
本教程中的步骤:
(1)差异表达分析,
(2) 进行GSEA
以及最后 (3) 可视化。
参考:
1- clusterProfiler: universal enrichment tool for functional and comparative study
2- Fast Gene Set Enrichment Analysis
3- Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles
4- DAVID Bioinformatics Resources 6.8
5- DESeq results in pathways in 60 Seconds with the fgsea package
6- Rank-rank hypergeometric overlap: identification of statistically significant overlap between gene-expression signatures
7- Clustering of DAVID gene enrichment results from gene expression studies by Kevin Blighe.
请关注“恒诺新知”微信公众号,感谢“R语言“,”数据那些事儿“,”老俊俊的生信笔记“,”冷🈚️思“,“珞珈R”,“生信星球”的支持!