教程:Gene Set Enrichment Analysis
本教程的原始帖子可以在我的 GitHub上找到。 有关我用于使此线程可用的引用列表，请参阅页面的最终结束。
Gene Ontology (GO)
• Molecular Function (MF) : 它定义了基因产物的分子活性。
• Cellular Component (CC) : 这描述了基因产物的活性/局部化。
• Biological Process (BP) : 描述基因产物活跃的途径和过程。
Kyoto Encyclopedia of Genes and Genomes (KEGG)
1- Over Representation Analysis (ORA): This is the simplest version of enrichment analysis and at the same time the most widely used approach. The concept in this approach is based on a Fisher exact test p-value in a contingency table. There is a relatively large number of web-tools R package for ORA. Personally, I am a fan of DAVID web tools however its last update was in 2016 (DAVID 6.8 Oct. 2016).
update It is happening to me to have a list of genes and want to know what are common GO terms (usually BP) for these genes regardless of statistical significance. The package
clusterProfiler provides a function
groupGO that can be used to answer these kinds of questions. I have added codes for this analysis to the GitHub repository.
2- Gene Set Enrichment Analysis (GSEA):
It was developed by Broad Institute. This is the preferred method when genes are coming from an expression experiment like microarray and RNA-seq. However, the original methodology was designed to work on microarray but later modification made it suitable for RNA-seq also. In this approach, you need to rank your genes based on a statistic (like what
DESeq2 provides, Wald statistic), and then perform enrichment analysis against different pathways (= gene set). You have to download the gene set files into your local system. The point is that here the algorithm will use all genes in the ranked list for enrichment analysis. [in contrast to ORA where only genes passed a specific threshold (like DE ones) would be used for enrichment analysis]. You can find more details about the methodology on the original PNAS paper.. To download these gene sets in a folder go to the MSigDB website, register, and download the data.
以及最后 (3) 可视化。
1- clusterProfiler: universal enrichment tool for functional and comparative study
2- Fast Gene Set Enrichment Analysis
3- Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles
4- DAVID Bioinformatics Resources 6.8
5- DESeq results in pathways in 60 Seconds with the fgsea package
6- Rank-rank hypergeometric overlap: identification of statistically significant overlap between gene-expression signatures
7- Clustering of DAVID gene enrichment results from gene expression studies by Kevin Blighe.