【RNA-Seq数据转化小技巧】使用countToFPKM包轻松完成counts到FPKM转化
前言:最近碰到需要将counts转成fpkm的情况,也从网上查了好多资料,无奈数学功底太差,算不出来,也不好意思问大佬。所以只能借助现成的包完成了,好在也有现成的,要不然免不了一打翻折腾。比较简单我就照搬过来,不翻译了。
参考:https://github.com/AAlhendi1707/countToFPKM
The ‘countToFPKM’ package provides a robust function to convert the feature counts of paired-end RNA-Seq into FPKM normalised values by library size and feature effective length. Implements the algorithm described in Trapnell,C. et al. (2010).
This package includes two functions:
fpkm()
fpkmheatmap()
The fpkm()
function converts the feature counts into FPKM values, it requires three arguments to return FPKM as numeric matrix normalized by library size and feature length:
counts
a numeric matrix of raw feature counts.featureLength
a numeric vector with feature lengths that can be obtained using
biomaRt package.meanFragmentLength
a numeric vector with mean fragment lengths, which can be calculated using the
CollectInsertSizeMetrics(Picard) tool.
The fpkmheatmap()
function provides users with a robust method to generate a FPKM heatmap plot of the highly variable features in RNA-Seq dataset. It takes an FPKM numeric matrix which can be obtained using fpkm()
function as input. By default using Pearson correlation – 1 to measure the distance between features, and Spearman correlation -1 for clustering of samples. By default log10 transformation of (FPKM+1) is applied to make variation similar across orders of magnitude. It uses the var() function to identify the highly variable features. It then uses Heatmap() function from the ‘ComplexHeatmap’ package to generate a heatmap plot.
To cite the R package ‘countToFPKM’ in publications use:
“Alhendi, A.S.N. (2019). countToFPKM: Convert Counts to Fragments per Kilobase of Transcript per Million (FPKM). R package version 1.0.0. https://CRAN.R-project.org/package=countToFPKM“
Installation
## Install dependencies
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("ComplexHeatmap")
## Stable version, install countToFPKM from CRAN
install.packages("countToFPKM")
## Lastest version, install countToFPKM from GitHub
if(!require(devtools)) install.packages("devtools")
devtools::install_github("AAlhendi1707/countToFPKM", build_vignettes = TRUE)
Usage example
library(countToFPKM)
file.readcounts <- system.file("extdata", "RNA-seq.read.counts.csv", package="countToFPKM")
file.annotations <- system.file("extdata", "Biomart.annotations.hg38.txt", package="countToFPKM")
file.sample.metrics <- system.file("extdata", "RNA-seq.samples.metrics.txt", package="countToFPKM")
# Import the read count matrix data into R.
counts <- as.matrix(read.csv(file.readcounts))
# Import feature annotations.
# Assign feature lenght into a numeric vector.
gene.annotations <- read.table(file.annotations, sep="\t", header=TRUE)
featureLength <- gene.annotations$length
# Import sample metrics.
# Assign mean fragment length into a numeric vector.
samples.metrics <- read.table(file.sample.metrics, sep="\t", header=TRUE)
meanFragmentLength <- samples.metrics$meanFragmentLength
# Return FPKM into a numeric matrix.
fpkm_matrix <- fpkm (counts, featureLength, meanFragmentLength)
# Plot log10(FPKM+1) heatmap of top 30 highly variable features
fpkmheatmap(fpkm_matrix, topvar=30, showfeaturenames=TRUE, return_log = TRUE)
请关注“恒诺新知”微信公众号,感谢“R语言“,”数据那些事儿“,”老俊俊的生信笔记“,”冷🈚️思“,“珞珈R”,“生信星球”的支持!