• 主页
  • 课程

    关于课程

    • 课程归档
    • 成为一名讲师
    • 讲师信息
    教学以及管理操作教程

    教学以及管理操作教程

    ¥1,000.00 ¥100.00
    阅读更多
  • 特色
    • 展示
    • 关于我们
    • 问答
  • 事件
  • 个性化
  • 博客
  • 联系
  • 站点资源
    有任何问题吗?
    (00) 123 456 789
    weinfoadmin@weinformatics.cn
    注册登录
    恒诺新知
    • 主页
    • 课程

      关于课程

      • 课程归档
      • 成为一名讲师
      • 讲师信息
      教学以及管理操作教程

      教学以及管理操作教程

      ¥1,000.00 ¥100.00
      阅读更多
    • 特色
      • 展示
      • 关于我们
      • 问答
    • 事件
    • 个性化
    • 博客
    • 联系
    • 站点资源

      TCGA数据挖掘

      • 首页
      • 博客
      • TCGA数据挖掘
      • 【RNA-seq数据分析】让你的差异基因分析变“花”

      【RNA-seq数据分析】让你的差异基因分析变“花”

      • 发布者 weinfoeditor
      • 分类 TCGA数据挖掘
      • 日期 2021年9月14日
      • 评论 0评论

      测序之后的差异基因分析和分组展示是一个必须的过程,好的分析可以把结果展示的美观具有极强的可读性。但是在实际的操作中经过反复的转换也是一个麻烦事,关键是要达到很好对大多数人来说还是需要下一番功夫。好吧,就交给今天的主角来干这件事吧。只能说翻译也是一个体力活,请自选————————

      摘要

      差异基因表达 (DGE) 是 RNA 测序 (RNA-seq) 数据最常见的应用之一。该过程允许在两个或多个条件下阐明差异表达的基因 (DEG)。由于基于所选工具的各种格式以及这些结果文件中提供的大量信息,对 DGE 结果的解释可能不直观且耗时。在这里,我们展示了一个 R 包 ViDGER(使用 R 可视化差异基因表达结果),其中包含九个函数,这些函数生成信息丰富的可视化,用于解释来自三个广泛使用的工具 Cuffdiff、DESeq2 和 edgeR 的 DGE 结果。

      R语言入群

      R语言入群

      R语言群,不知怎么的入群的人很少啊,请大家支持呐--

      示例 S1:安装和数据示例

      此软件包的稳定版本可在 Bioconductor上获得。您可以通过运行以下命令来安装它:

      if (!requireNamespace("BiocManager", quietly=TRUE))
          install.packages("BiocManager")
      BiocManager::install("vidger")

      可以使用 devtools 包通过 GitHub 安装 ViDGER 的最新开发版本

      if (!require("devtools")) install.packages("devtools")
      devtools::install_github("btmonier/vidger", ref = "devel")

      Onc安装后,您将可以访问以下功能:

      • vsBoxplot()
      • vsScatterPlot()
      • vsScatterMatrix()
      • vsDEGMatrix()
      • vsMAPlot()
      • vsMAMatrix()
      • vsVolcano()
      • vsVolcanoMatrix()
      • vsFourWay()

      在下面的示例中,将使用三个测试数据集:df.cuff、df.deseq 和 df.edger。这些数据集中的每一个都反映了该软件包涵盖的三个 RNA-seq 分析。这些可以使用以下命令加载到 R 工作区中:

      data(<data_set>)

      其中<data_set> 是前面提到的数据集之一。在这些函数中的每一个中找到的一些重复元素是 type 和 d.factor 参数。类型参数告诉函数如何处理每种分析类型的数据(即“cuffdiff”、“deseq”或“edge”)。 d.factor 参数专门用于 DESeq2 对象,我们将在 DESeq2 部分讨论这些对象。通过查看每个函数的相应帮助文件(比如?vsScatterPlot),将进一步详细讨论所有其他参数。

      所用数据概览

      如前所述,此包中包含三个玩具数据集。除了这些数据集之外,还使用了 5 个“真实世界”数据集。目前使用的所有真实世界数据都未从正在进行的合作中发布。这些数据的摘要可以在下表中找到:

      表 1:此包中包含的测试数据集概述。在此表中,根据使用的分析软件、生物体 ID、实验布局(重复和处理)、转录本 (ID) 的数量以及以兆字节 (MB) 为单位的数据对象大小对每个数据集进行了总结。

      数据软件物种重复处理.IDs大小(MB)
      df.cuffCuffDiffH2312000.2
      sapiens
      df.deseqDESeq2D.23293912.3
      melanogaster
      df.deseqedgeRA.237240.1
      thaliana

      表 2:“真实世界”(RW)数据集统计数据。为了测试我们包装的可靠性,我们使用了来自人类收藏和几个植物样本的真实数据。每个数据集都根据生物体 ID、实验样本数 (n)、实验条件和转录本 (ID) 数进行总结。

      数据物种个数实验.条件IDs
      RW-1H.10Two treatment dosages taken at two198002
      sapienstime points and one control sample
      taken at one time point
      RW-2M.24Two phenotypes taken at four time63517
      domestiapoints (three replicates each)
      RW-3V.6Two conditions (three replicates59262
      ripria:each).
      bud
      RW-4V.6Two conditions (three replicates17962
      ripria:each).
      shoot-tip
      (7 days)
      RW-5V.6Two conditions (three replicates19064
      ripria:each).
      shoot-tip
      (21 days)

      示例 S2:创建箱线图

      箱线图是一种确定数据分布的有用方法。在这种情况下,我们可以使用 vsBoxPlot() 函数来确定 FPKM 或 CPM 值的分布。此功能允许您从分析对象中提取必要的基于结果的数据,以创建比较实验处理的 log10log10(FPKM 或 CPM)分布的箱线图。

      使用Cuffdiff

      vsBoxPlot(
          data = df.cuff, d.factor = NULL, type = 'cuffdiff', title = TRUE, 
          legend = TRUE, grid = TRUE
      )

      A box plot example using the `vsBoxPlot()` function with 
`cuffdiff` data. In this example, FPKM distributions for each treatment within 
an experiment are shown in the form of a box and whisker plot.

      图 1:使用 vsBoxPlot() 函数的箱线图示例 cuffdiff 数据。在此示例中,实验中每个处理的 FPKM 分布以箱线图的形式显示。

      使用DESeq2

      vsBoxPlot(
          data = df.deseq, d.factor = 'condition', type = 'deseq', 
          title = TRUE, legend = TRUE, grid = TRUE
      )

      A box plot example using the `vsBoxPlot()` function with 
`DESeq2` data. In this example, FPKM distributions for each treatment within 
an experiment are shown in the form of a box and whisker plot.

      图 2:使用 vsBoxPlot() 函数的箱线图示例 DESeq2 数据。在此示例中,实验中每个处理的 FPKM 分布以箱线图的形式显示。

      使用edgeR

      vsBoxPlot(
          data = df.edger, d.factor = NULL, type = 'edger', 
          title = TRUE, legend = TRUE, grid = TRUE
      )

      A box plot example using the `vsBoxPlot()` function with `edgeR` 
data. In this example, CPM distributions for each treatment within an 
experiment are shown in the form of a box and whisker plot

      F图 3:使用 vsBoxPlot() 函数和 edgeR 的箱线图示例数据。在此示例中,实验中每个处理的 CPM 分布以箱线图的形式显示

      箱形图的美学优化

      vsBoxPlot() 可以允许不同的迭代来展示数据分布。这些更改可以使用 aes 参数实现。目前,有 6 种不同的变体:

      • box: standard box plot
      • violin: violin plot
      • boxdot: box plot with dot plot overlay
      • viodot: violin plot with dot plot overlay
      • viosumm: violin plot with summary stats overlay
      • notch: box plot with notch

      box variant

      data("df.edger")
      vsBoxPlot(
         data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
         legend = TRUE, grid = TRUE, aes = "box"
      )

      A box plot example using the `aes` parameter: `box`.

      图 4:使用 aes 和box参数的箱线图示例

      violin variant

      data("df.edger")
      vsBoxPlot(
         data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
         legend = TRUE, grid = TRUE, aes = "violin"
      )

      A box plot example using the `aes` parameter: `violin`.

      图 5:使用 aes 和violin参数的箱线图示例

      boxdot variant

      data("df.edger")
      vsBoxPlot(
         data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
         legend = TRUE, grid = TRUE, aes = "boxdot"
      )

      A box plot example using the `aes` parameter: `boxdot`.

      图 6:使用 aes 和boxdot参数的箱线图示例:

      viodot variant

      data("df.edger")
      vsBoxPlot(
         data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
         legend = TRUE, grid = TRUE, aes = "viodot"
      )

      A box plot example using the `aes` parameter: `viodot`.

      图 7:使用 aes 和viodot参数的箱线图示例:

      viosumm variant

      data("df.edger")
      vsBoxPlot(
         data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
         legend = TRUE, grid = TRUE, aes = "viosumm"
      )

      A box plot example using the `aes` parameter: `viosumm`.

      图 8:使用 aes 参数的箱线图示例:viosumm

      notch variant

      data("df.edger")
      vsBoxPlot(
         data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
         legend = TRUE, grid = TRUE, aes = "notch"
      )

      A box plot example using the `aes` parameter: `notch`.

      图 9:使用 aes 和notch参数的箱线图示例:缺口

      箱形图的调色板变体

      除了美学上的变化,每个变体的填充颜色也可以改变。这可以通过修改 fill.color 参数来实现。

      The palettes that can be used for this parameter are based off of the palettes found in the RColorBrewer  A visual list of all the palettes can be found . 可用于此参数的调色板基于 RColorBrewer package包中的调色板。可以在此处here找到所有调色板的可视化列表。

      颜色变体示例 1

      data("df.edger")
      vsBoxPlot(
         data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
         legend = TRUE, grid = TRUE, aes = "box", fill.color = "RdGy"
      )

      Color variant 1. A box plot example using the `fill.color` 
parameter: `RdGy`.

      Figure 10: Color variant 1
      A box plot example using the fill.color parameter: RdGy.

      Color variant example 2

      data("df.edger")
      vsBoxPlot(
         data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
         legend = TRUE, grid = TRUE, aes = "viosumm", fill.color = "Paired"
      )

      Color variant 2. A violin plot example using the `fill.color` 
parameter: `Paired` with the `aes` parameter: `viosumm`.

      Figure 11: Color variant 2
      A violin plot example using the fill.color parameter: Paired with the aes parameter: viosumm.

      Color variant example 3

      data("df.edger")
      vsBoxPlot(
         data = df.edger, d.factor = NULL, type = "edger", title = TRUE,
         legend = TRUE, grid = TRUE, aes = "notch", fill.color = "Greys"
      )

      Color variant 3. A notched box plot example using the `fill.color` 
parameter: `Greys` with the `aes` parameter: `notch`. Using these parameters,
we can also generate grey-scale plots.

      Figure 12: Color variant 3
      A notched box plot example using the fill.color parameter: Greys with the aes parameter: notch. Using these parameters, we can also generate grey-scale plots.

      Example S3: Creating scatter plots

      This example will look at a basic scatter plot function, vsScatterPlot(). This function allows you to visualize comparisons of log10log10 values of either FPKM or CPM measurements of two treatments depending on analytical type.

      With Cuffdiff

      vsScatterPlot(
          x = 'hESC', y = 'iPS', data = df.cuff, type = 'cuffdiff',
          d.factor = NULL, title = TRUE, grid = TRUE
      )

      A scatterplot example using the `vsScatterPlot()` function with 
`Cuffdiff` data. In this visualization, $log_{10}$ comparisons are made of 
fragments per kilobase of transcript per million mapped reads (FPKM) 
measurments. The dashed line represents regression line for the comparison.

      Figure 13: A scatterplot example using the vsScatterPlot() function with
      Cuffdiff data. In this visualization, log10log10 comparisons are made of fragments per kilobase of transcript per million mapped reads (FPKM) measurments. The dashed line represents regression line for the comparison.

      With DESeq2

      vsScatterPlot(
          x = 'treated_paired.end', y = 'untreated_paired.end', 
          data = df.deseq, type = 'deseq', d.factor = 'condition', 
          title = TRUE, grid = TRUE
      )

      A scatterplot example using the `vsScatterPlot()` function with 
`DESeq2` data. In this visualization, $log_{10}$ comparisons are made of 
fragments per kilobase of transcript per million mapped reads (FPKM) 
measurments. The dashed line represents regression line for the comparison.

      Figure 14: A scatterplot example using the vsScatterPlot() function with
      DESeq2 data. In this visualization, log10log10 comparisons are made of fragments per kilobase of transcript per million mapped reads (FPKM) measurments. The dashed line represents regression line for the comparison.

      With edgeR

      vsScatterPlot(
          x = 'WM', y = 'MM', data = df.edger, type = 'edger',
          d.factor = NULL, title = TRUE, grid = TRUE
      )

      A scatterplot example using the `vsScatterPlot()` function with 
`edgeR` data. In this visualization, $log_{10}$ comparisons are made of 
fragments per kilobase of transcript per million mapped reads (FPKM) 
measurments. The dashed line represents regression line for the comparison.

      Figure 15: A scatterplot example using the vsScatterPlot() function with
      edgeR data. In this visualization, log10log10 comparisons are made of fragments per kilobase of transcript per million mapped reads (FPKM) measurments. The dashed line represents regression line for the comparison.

      Example S4: Creating scatter plot matrices

      This example will look at an extension of the vsScatterPlot() function which is vsScatterMatrix(). This function will create a matrix of all possible comparisons of treatments within an experiment with additional info.

      With Cuffdiff

      vsScatterMatrix(
          data = df.cuff, d.factor = NULL, type = 'cuffdiff', 
          comp = NULL, title = TRUE, grid = TRUE, man.title = NULL
      )

      A scatterplot matrix example using the `vsScatterMatrix()` 
function with `Cuffdiff` data. Similar to the scatterplot function, this 
visualization allows for all comparisons to be made within an experiment. In 
addition to the scatterplot visuals, FPKM distributions (histograms) and 
correlation (Corr) values are generated.

      Figure 16: A scatterplot matrix example using the vsScatterMatrix()
      function with Cuffdiff data. Similar to the scatterplot function, this visualization allows for all comparisons to be made within an experiment. In addition to the scatterplot visuals, FPKM distributions (histograms) and correlation (Corr) values are generated.

      With DESeq2

      vsScatterMatrix(
          data = df.deseq, d.factor = 'condition', type = 'deseq',
          comp = NULL, title = TRUE, grid = TRUE, man.title = NULL
      )

      A scatterplot matrix example using the `vsScatterMatrix()` 
function with `DESeq2` data. Similar to the scatterplot function, this 
visualization allows for all comparisons to be made within an experiment. In 
addition to the scatterplot visuals, FPKM distributions (histograms) and 
correlation (Corr) values are generated.

      Figure 17: A scatterplot matrix example using the vsScatterMatrix()
      function with DESeq2 data. Similar to the scatterplot function, this visualization allows for all comparisons to be made within an experiment. In addition to the scatterplot visuals, FPKM distributions (histograms) and correlation (Corr) values are generated.

      With edgeR

      vsScatterMatrix(
          data = df.edger, d.factor = NULL, type = 'edger', comp = NULL,
          title = TRUE, grid = TRUE, man.title = NULL
      )

      A scatterplot matrix example using the `vsScatterMatrix()` 
function with `edgeR` data. Similar to the scatterplot function, this 
visualization allows for all comparisons to be made within an experiment. In 
addition to the scatterplot visuals, FPKM distributions (histograms) and 
correlation (Corr) values are generated.

      Figure 18: A scatterplot matrix example using the vsScatterMatrix()
      function with edgeR data. Similar to the scatterplot function, this visualization allows for all comparisons to be made within an experiment. In addition to the scatterplot visuals, FPKM distributions (histograms) and correlation (Corr) values are generated.

      Example S5: Creating differential gene expression matrices

      Using the vsDEGMatrix() function allows the user to visualize the number of differentially expressed genes (DEGs) at a given adjusted p-value (padj = ) for each experimental treatment level. Higher color intensity correlates to a higher number of DEGs.

      With Cuffdiff

      vsDEGMatrix(
          data = df.cuff, padj = 0.05, d.factor = NULL, type = 'cuffdiff', 
          title = TRUE, legend = TRUE, grid = TRUE
      )

      A matrix of differentially expressed genes (DEGs) at a given 
*p*-value using the `vsDEGMatrix()` function with `Cuffdiff` data. With this 
function, the user is able to visualize the number of DEGs ata given adjusted 
*p*-value for each experimental treatment level. Higher color intensity 
correlates to a higher number of DEGs.

      Figure 19: A matrix of differentially expressed genes (DEGs) at a given
      p-value using the vsDEGMatrix() function with Cuffdiff data. With this function, the user is able to visualize the number of DEGs ata given adjusted p-value for each experimental treatment level. Higher color intensity correlates to a higher number of DEGs.

      With DESeq2

      vsDEGMatrix(
          data = df.deseq, padj = 0.05, d.factor = 'condition', 
          type = 'deseq', title = TRUE, legend = TRUE, grid = TRUE
      )

      A matrix of differentially expressed genes (DEGs) at a given 
*p*-value using the `vsDEGMatrix()` function with `DESeq2` data. With this 
function, the user is able to visualize the number of DEGs ata given adjusted 
*p*-value for each experimental treatment level. Higher color intensity 
correlates to a higher number of DEGs.

      Figure 20: A matrix of differentially expressed genes (DEGs) at a given
      p-value using the vsDEGMatrix() function with DESeq2 data. With this function, the user is able to visualize the number of DEGs ata given adjusted p-value for each experimental treatment level. Higher color intensity correlates to a higher number of DEGs.

      With edgeR

      vsDEGMatrix(
          data = df.edger, padj = 0.05, d.factor = NULL, type = 'edger', 
          title = TRUE, legend = TRUE, grid = TRUE
      )

      A matrix of differentially expressed genes (DEGs) at a given 
*p*-value using the `vsDEGMatrix()` function with `edgeR` data. With this 
function, the user is able to visualize the number of DEGs ata given adjusted 
*p*-value for each experimental treatment level. Higher color intensity 
correlates to a higher number of DEGs.

      Figure 21: A matrix of differentially expressed genes (DEGs) at a given
      p-value using the vsDEGMatrix() function with edgeR data. With this function, the user is able to visualize the number of DEGs ata given adjusted p-value for each experimental treatment level. Higher color intensity correlates to a higher number of DEGs.

      Grey-scale DEG matrices

      A grey-scale option is available for this function if you wish to use a grey-to-white gradient instead of the classic blue-to-white gradient. This can be invoked by setting the grey.scale parameter to TRUE.

      vsDEGMatrix(data = df.deseq, d.factor = "condition", type = "deseq",
          grey.scale = TRUE
      )

      Example S6: Creating MA plots

      vsMAPlot() visualizes the variance between two samples in terms of gene expression values where logarithmic fold changes of count data are plotted against mean counts. For more information on how each of the aesthetics are plotted, please refer to the figure captions and Method S1.

      With Cuffdiff

      vsMAPlot(
          x = 'iPS', y = 'hESC', data = df.cuff, d.factor = NULL, 
          type = 'cuffdiff', padj = 0.05, y.lim = NULL, lfc = NULL, 
          title = TRUE, legend = TRUE, grid = TRUE
      )

      MA plot visualization using the `vsMAPLot()` function with 
`Cuffdiff` data. LFCs are plotted mean counts to determine the variance 
between two treatments in terms of gene expression. Blue nodes on the graph 
represent statistically significant LFCs which are greater than a given value 
than a user-defined LFC parameter. Green nodes indicate statistically 
significant LFCs which are less than the user-defined LFC parameter. Gray 
nodes are data points that are not statistically significant. Numerical values 
in parantheses for each legend color indicate the number of transcripts that 
meet the prior conditions. Triangular shapes represent values that exceed the 
viewing area of the graph. Node size changes represent the magnitude of the 
LFC values (i.e. larger shapes reflect larger LFC values). Dashed lines 
indicate user-defined LFC values.

      Figure 22: MA plot visualization using the vsMAPLot() function with
      Cuffdiff data. LFCs are plotted mean counts to determine the variance between two treatments in terms of gene expression. Blue nodes on the graph represent statistically significant LFCs which are greater than a given value than a user-defined LFC parameter. Green nodes indicate statistically significant LFCs which are less than the user-defined LFC parameter. Gray nodes are data points that are not statistically significant. Numerical values in parantheses for each legend color indicate the number of transcripts that meet the prior conditions. Triangular shapes represent values that exceed the viewing area of the graph. Node size changes represent the magnitude of the LFC values (i.e. larger shapes reflect larger LFC values). Dashed lines indicate user-defined LFC values.

      With DESeq2

      vsMAPlot(
          x = 'treated_paired.end', y = 'untreated_paired.end', 
          data = df.deseq, d.factor = 'condition', type = 'deseq', 
          padj = 0.05, y.lim = NULL, lfc = NULL, title = TRUE, 
          legend = TRUE, grid = TRUE
      )

      MA plot visualization using the `vsMAPLot()` function with 
`DESeq2` data. LFCs are plotted mean counts to determine the variance between 
two treatments in terms of gene expression. Blue nodes on the graph represent 
statistically significant LFCs which are greater than a given value than a 
user-defined LFC parameter. Green nodes indicate statistically significant
LFCs which are less than the user-defined LFC parameter. Gray nodes are data 
points that are not statistically significant. Numerical values in parantheses 
for each legend color indicate the number of transcripts that meet the prior 
conditions. Triangular shapes represent values that exceed the viewing area of 
the graph. Node size changes represent the magnitude of the LFC values (i.e. 
larger shapes reflect larger LFC values). Dashed lines indicate user-defined 
LFC values.

      Figure 23: MA plot visualization using the vsMAPLot() function with
      DESeq2 data. LFCs are plotted mean counts to determine the variance between two treatments in terms of gene expression. Blue nodes on the graph represent statistically significant LFCs which are greater than a given value than a user-defined LFC parameter. Green nodes indicate statistically significant LFCs which are less than the user-defined LFC parameter. Gray nodes are data points that are not statistically significant. Numerical values in parantheses for each legend color indicate the number of transcripts that meet the prior conditions. Triangular shapes represent values that exceed the viewing area of the graph. Node size changes represent the magnitude of the LFC values (i.e.  larger shapes reflect larger LFC values). Dashed lines indicate user-defined LFC values.

      With edgeR

      vsMAPlot(
          x = 'WW', y = 'MM', data = df.edger, d.factor = NULL, 
          type = 'edger', padj = 0.05, y.lim = NULL, lfc = NULL, 
          title = TRUE, legend = TRUE, grid = TRUE
      )

      MA plot visualization using the `vsMAPLot()` function with 
`edgeR` data. LFCs are plotted mean counts to determine the variance between 
two treatments in terms of gene expression. Blue nodes on the graph represent 
statistically significant LFCs which are greater than a given value than a 
user-defined LFC parameter. Green nodes indicate statistically significant 
LFCs which are less than the user-defined LFC parameter. Gray nodes are data 
points that are not statistically significant. Numerical values in parantheses 
for each legend color indicate the number of transcripts that meet the prior 
conditions. Triangular shapes represent values that exceed the viewing area of 
the graph. Node size changes represent the magnitude of the LFC values (i.e. 
larger shapes reflect larger LFC values). Dashed lines indicate user-defined 
LFC values.

      Figure 24: MA plot visualization using the vsMAPLot() function with
      edgeR data. LFCs are plotted mean counts to determine the variance between two treatments in terms of gene expression. Blue nodes on the graph represent statistically significant LFCs which are greater than a given value than a user-defined LFC parameter. Green nodes indicate statistically significant LFCs which are less than the user-defined LFC parameter. Gray nodes are data points that are not statistically significant. Numerical values in parantheses for each legend color indicate the number of transcripts that meet the prior conditions. Triangular shapes represent values that exceed the viewing area of the graph. Node size changes represent the magnitude of the LFC values (i.e.  larger shapes reflect larger LFC values). Dashed lines indicate user-defined LFC values.

      Example S7: Creating MA plot matrices

      Similar to a scatter plot matrix, vsMAMatrix() will produce visualizations for all comparisons within your data set. For more information on how the aesthetics are plotted in these visualizations, please refer to the figure caption and Method S1.

      With Cuffdiff

       vsMAMatrix(
          data = df.cuff, d.factor = NULL, type = 'cuffdiff', 
          padj = 0.05, y.lim = NULL, lfc = 1, title = TRUE, 
          grid = TRUE, counts = TRUE, data.return = FALSE
      )

      A MA plot matrix using the `vsMAMatrix()` function with `Cuffdiff` 
data. Similar to the `vsMAPlot()` function, `vsMAMatrix()` will generate a 
matrix of MA plots for all comparisons within an experiment. LFCs are plotted 
mean counts to determine the variance between two treatments in terms of gene 
expression. Blue nodes on the graph represent statistically significant LFCs 
which are greater than a given value than a user-defined LFC parameter. Green 
nodes indicate statistically significant LFCs which are less than the 
user-defined LFC parameter. Gray nodes are data points that are not 
statistically significant. Numerical values in parantheses for each legend 
color indicate the number of transcripts that meet the prior conditions. 
Triangular shapes represent values that exceed the viewing area of the graph. 
Node size changes represent the magnitude of the LFC values (i.e. larger 
shapes reflect larger LFC values). Dashed lines indicate user-defined LFC 
values.

      Figure 25: A MA plot matrix using the vsMAMatrix() function with Cuffdiff
      data. Similar to the vsMAPlot() function, vsMAMatrix() will generate a matrix of MA plots for all comparisons within an experiment. LFCs are plotted mean counts to determine the variance between two treatments in terms of gene expression. Blue nodes on the graph represent statistically significant LFCs which are greater than a given value than a user-defined LFC parameter. Green nodes indicate statistically significant LFCs which are less than the user-defined LFC parameter. Gray nodes are data points that are not statistically significant. Numerical values in parantheses for each legend color indicate the number of transcripts that meet the prior conditions. Triangular shapes represent values that exceed the viewing area of the graph. Node size changes represent the magnitude of the LFC values (i.e. larger shapes reflect larger LFC values). Dashed lines indicate user-defined LFC values.

      With DESeq2

      vsMAMatrix(
          data = df.deseq, d.factor = 'condition', type = 'deseq', 
          padj = 0.05, y.lim = NULL, lfc = 1, title = TRUE, 
          grid = TRUE, counts = TRUE, data.return = FALSE
      )

      A MA plot matrix using the `vsMAMatrix()` function with `DESeq2` 
data. Similar to the `vsMAPlot()` function, `vsMAMatrix()` will generate a 
matrix of MA plots for all comparisons within an experiment. LFCs are plotted 
mean counts to determine the variance between two treatments in terms of gene 
expression. Blue nodes on the graph represent statistically significant LFCs 
which are greater than a given value than a user-defined LFC parameter. Green 
nodes indicate statistically significant LFCs which are less than the 
user-defined LFC parameter. Gray nodes are data points that are not 
statistically significant. Numerical values in parantheses for each legend 
color indicate the number of transcripts that meet the prior conditions. 
Triangular shapes represent values that exceed the viewing area of the graph. 
Node size changes represent the magnitude of the LFC values (i.e. larger 
shapes reflect larger LFC values). Dashed lines indicate user-defined LFC 
values.

      Figure 26: A MA plot matrix using the vsMAMatrix() function with DESeq2
      data. Similar to the vsMAPlot() function, vsMAMatrix() will generate a matrix of MA plots for all comparisons within an experiment. LFCs are plotted mean counts to determine the variance between two treatments in terms of gene expression. Blue nodes on the graph represent statistically significant LFCs which are greater than a given value than a user-defined LFC parameter. Green nodes indicate statistically significant LFCs which are less than the user-defined LFC parameter. Gray nodes are data points that are not statistically significant. Numerical values in parantheses for each legend color indicate the number of transcripts that meet the prior conditions. Triangular shapes represent values that exceed the viewing area of the graph. Node size changes represent the magnitude of the LFC values (i.e. larger shapes reflect larger LFC values). Dashed lines indicate user-defined LFC values.

      With edgeR

      vsMAMatrix(
          data = df.edger, d.factor = NULL, type = 'edger', 
          padj = 0.05, y.lim = NULL, lfc = 1, title = TRUE, 
          grid = TRUE, counts = TRUE, data.return = FALSE
      )

      A MA plot matrix using the `vsMAMatrix()` function with `edgeR` 
data. Similar to the `vsMAPlot()` function, `vsMAMatrix()` will generate a 
matrix of MA plots for all comparisons within an experiment. LFCs are plotted 
mean counts to determine the variance between two treatments in terms of gene 
expression. Blue nodes on the graph represent statistically significant LFCs 
which are greater than a given value than a user-defined LFC parameter. Green 
nodes indicate statistically significant LFCs which are less than the 
user-defined LFC parameter. Gray nodes are data points that are not 
statistically significant. Numerical values in parantheses for each legend 
color indicate the number of transcripts that meet the prior conditions. 
Triangular shapes represent values that exceed the viewing area of the graph. 
Node size changes represent the magnitude of the LFC values (i.e. larger 
shapes reflect larger LFC values). Dashed lines indicate user-defined LFC 
values.

      Figure 27: A MA plot matrix using the vsMAMatrix() function with edgeR
      data. Similar to the vsMAPlot() function, vsMAMatrix() will generate a matrix of MA plots for all comparisons within an experiment. LFCs are plotted mean counts to determine the variance between two treatments in terms of gene expression. Blue nodes on the graph represent statistically significant LFCs which are greater than a given value than a user-defined LFC parameter. Green nodes indicate statistically significant LFCs which are less than the user-defined LFC parameter. Gray nodes are data points that are not statistically significant. Numerical values in parantheses for each legend color indicate the number of transcripts that meet the prior conditions. Triangular shapes represent values that exceed the viewing area of the graph. Node size changes represent the magnitude of the LFC values (i.e. larger shapes reflect larger LFC values). Dashed lines indicate user-defined LFC values.

      Example S8: Creating volcano plots

      The next few visualizations will focus on ways to display differential gene expression between two or more treatments. Volcano plots visualize the variance between two samples in terms of gene expression values where the −log10−log10 of calculated p-values (y-axis) are a plotted against the log2log2 changes (x-axis). These plots can be visualized with the vsVolcano() function. For more information on how each of the aesthetics are plotted, please refer to the figure captions and Method S1.

      With Cuffdiff

      vsVolcano(
          x = 'iPS', y = 'hESC', data = df.cuff, d.factor = NULL, 
          type = 'cuffdiff', padj = 0.05, x.lim = NULL, lfc = NULL, 
          title = TRUE, legend = TRUE, grid = TRUE, data.return = FALSE
      )

      A volcano plot example using the `vsVolcano()` function with 
`Cuffdiff` data. In this visualization, comparisons are made between the 
$-log_{10}$ *p*-value versus the $log_2$ fold change (LFC) between two 
treatments. Blue nodes on the graph represent statistically significant LFCs 
which are greater than a given value than a user-defined LFC parameter. Green 
nodes indicate statistically significant LFCs which are less than the 
user-defined LFC parameter. Gray nodes are data points that are not 
statistically significant. Numerical values in parantheses for each legend 
color indicate the number of transcripts that meet the prior conditions. Left 
and right brackets (< and >) represent values that exceed the viewing area of 
the graph. Node size changes represent the magnitude of the LFC values (i.e. 
larger shapes reflect larger LFC values). Vertical and horizontal lines 
indicate user-defined LFC and adjusted *p*-values, respectively.” width=”100%”></p>



<p>Figure 28: <strong>A volcano plot example using the <code>vsVolcano()</code> function with</strong><br><code>Cuffdiff</code> data. In this visualization, comparisons are made between the −log10−log10 <em>p</em>-value versus the log2log2 fold change (LFC) between two treatments. Blue nodes on the graph represent statistically significant LFCs which are greater than a given value than a user-defined LFC parameter. Green nodes indicate statistically significant LFCs which are less than the user-defined LFC parameter. Gray nodes are data points that are not statistically significant. Numerical values in parantheses for each legend color indicate the number of transcripts that meet the prior conditions. Left and right brackets (< and >) represent values that exceed the viewing area of the graph. Node size changes represent the magnitude of the LFC values (i.e.  larger shapes reflect larger LFC values). Vertical and horizontal lines indicate user-defined LFC and adjusted <em>p</em>-values, respectively.</p>



<h2>With DESeq2</h2>



<pre class=vsVolcano( x = 'treated_paired.end', y = 'untreated_paired.end', data = df.deseq, d.factor = 'condition', type = 'deseq', padj = 0.05, x.lim = NULL, lfc = NULL, title = TRUE, legend = TRUE, grid = TRUE, data.return = FALSE )

      A volcano plot example using the `vsVolcano()` function with 
`DESeq2` data. In this visualization, comparisons are made between the 
$-log_{10}$ *p*-value versus the $log_2$ fold change (LFC) between two 
treatments. Blue nodes on the graph represent statistically significant LFCs 
which are greater than a given value than a user-defined LFC parameter. Green 
nodes indicate statistically significant LFCs which are less than the 
user-defined LFC parameter. Gray nodes are data points that are not 
statistically significant. Numerical values in parantheses for each legend 
color indicate the number of transcripts that meet the prior conditions. Left 
and right brackets (< and >) represent values that exceed the viewing area of 
the graph. Node size changes represent the magnitude of the LFC values (i.e. 
larger shapes reflect larger LFC values). Vertical and horizontal lines 
indicate user-defined LFC and adjusted *p*-values, respectively.” width=”100%”></p>



<p>Figure 29: <strong>A volcano plot example using the <code>vsVolcano()</code> function with</strong><br><code>DESeq2</code> data. In this visualization, comparisons are made between the −log10−log10 <em>p</em>-value versus the log2log2 fold change (LFC) between two treatments. Blue nodes on the graph represent statistically significant LFCs which are greater than a given value than a user-defined LFC parameter. Green nodes indicate statistically significant LFCs which are less than the user-defined LFC parameter. Gray nodes are data points that are not statistically significant. Numerical values in parantheses for each legend color indicate the number of transcripts that meet the prior conditions. Left and right brackets (< and >) represent values that exceed the viewing area of the graph. Node size changes represent the magnitude of the LFC values (i.e.  larger shapes reflect larger LFC values). Vertical and horizontal lines indicate user-defined LFC and adjusted <em>p</em>-values, respectively.</p>



<h2>With edgeR</h2>



<pre class=vsVolcano( x = 'WW', y = 'MM', data = df.edger, d.factor = NULL, type = 'edger', padj = 0.05, x.lim = NULL, lfc = NULL, title = TRUE, legend = TRUE, grid = TRUE, data.return = FALSE )

      A volcano plot example using the `vsVolcano()` function with 
`edgeR` data. In this visualization, comparisons are made between the 
$-log_{10}$ *p*-value versus the $log_2$ fold change (LFC) between two 
treatments. Blue nodes on the graph represent statistically significant LFCs 
which are greater than a given value than a user-defined LFC parameter. Green 
nodes indicate statistically significant LFCs which are less than the 
user-defined LFC parameter. Gray nodes are data points that are not 
statistically significant. Numerical values in parantheses for each legend 
color indicate the number of transcripts that meet the prior conditions. Left 
and right brackets (< and >) represent values that exceed the viewing area of 
the graph. Node size changes represent the magnitude of the LFC values (i.e. 
larger shapes reflect larger LFC values). Vertical and horizontal lines 
indicate user-defined LFC and adjusted *p*-values, respectively.” width=”100%”></p>



<p>Figure 30: <strong>A volcano plot example using the <code>vsVolcano()</code> function with</strong><br><code>edgeR</code> data. In this visualization, comparisons are made between the −log10−log10 <em>p</em>-value versus the log2log2 fold change (LFC) between two treatments. Blue nodes on the graph represent statistically significant LFCs which are greater than a given value than a user-defined LFC parameter. Green nodes indicate statistically significant LFCs which are less than the user-defined LFC parameter. Gray nodes are data points that are not statistically significant. Numerical values in parantheses for each legend color indicate the number of transcripts that meet the prior conditions. Left and right brackets (< and >) represent values that exceed the viewing area of the graph. Node size changes represent the magnitude of the LFC values (i.e.  larger shapes reflect larger LFC values). Vertical and horizontal lines indicate user-defined LFC and adjusted <em>p</em>-values, respectively.</p>



<h1>Example S9: Creating volcano plot matrices</h1>



<p>Similar to the prior matrix functions, <code>vsVolcanoMatrix()</code> will produce visualizations for all comparisons within your data set. For more information on how the aesthetics are plotted in these visualizations, please refer to the figure caption and Method S1.</p>



<h2>With Cuffdiff</h2>



<pre class=vsVolcanoMatrix( data = df.cuff, d.factor = NULL, type = 'cuffdiff', padj = 0.05, x.lim = NULL, lfc = NULL, title = TRUE, legend = TRUE, grid = TRUE, counts = TRUE )

      A volcano plot matrix using the `vsVolcanoMatrix()` function with 
`Cuffdiff` data. Similar to the `vsVolcano()` function, `vsVolcanoMatrix()` 
will generate a matrix of volcano plots for all comparisons within an 
experiment. Comparisons are made between the $-log_{10}$ *p*-value versus the 
$log_2$ fold change (LFC) between two treatments. Blue nodes on the graph 
represent statistically significant LFCs which are greater than a given value 
than a user-defined LFC parameter. Green nodes indicate statistically 
significant LFCs which are less than the user-defined LFC parameter. Gray 
nodes are data points that are not statistically significant. The blue and 
green numbers in each facet represent the number of transcripts that meet the 
criteria for blue and green nodes in each comparison. Left and right brackets 
(< and >) represent values that exceed the viewing area of the graph. Node 
size changes represent the magnitude of the LFC values (i.e. larger shapes 
reflect larger LFC values). Vertical and horizontal lines indicate 
user-defined LFC and adjusted *p*-values, respectively.” width=”100%”></p>



<p>Figure 31: <strong>A volcano plot matrix using the <code>vsVolcanoMatrix()</code> function with</strong><br><code>Cuffdiff</code> data. Similar to the <code>vsVolcano()</code> function, <code>vsVolcanoMatrix()</code> will generate a matrix of volcano plots for all comparisons within an experiment. Comparisons are made between the −log10−log10 <em>p</em>-value versus the log2log2 fold change (LFC) between two treatments. Blue nodes on the graph represent statistically significant LFCs which are greater than a given value than a user-defined LFC parameter. Green nodes indicate statistically significant LFCs which are less than the user-defined LFC parameter. Gray nodes are data points that are not statistically significant. The blue and green numbers in each facet represent the number of transcripts that meet the criteria for blue and green nodes in each comparison. Left and right brackets (< and >) represent values that exceed the viewing area of the graph. Node size changes represent the magnitude of the LFC values (i.e. larger shapes reflect larger LFC values). Vertical and horizontal lines indicate user-defined LFC and adjusted <em>p</em>-values, respectively.</p>



<h2>With DESeq2</h2>



<pre class=vsVolcanoMatrix( data = df.deseq, d.factor = 'condition', type = 'deseq', padj = 0.05, x.lim = NULL, lfc = NULL, title = TRUE, legend = TRUE, grid = TRUE, counts = TRUE )

      A volcano plot matrix using the `vsVolcanoMatrix()` function with 
`DESeq2` data. Similar to the `vsVolcano()` function, `vsVolcanoMatrix()` 
will generate a matrix of volcano plots for all comparisons within an 
experiment. Comparisons are made between the $-log_{10}$ *p*-value versus the 
$log_2$ fold change (LFC) between two treatments. Blue nodes on the graph 
represent statistically significant LFCs which are greater than a given value 
than a user-defined LFC parameter. Green nodes indicate statistically 
significant LFCs which are less than the user-defined LFC parameter. Gray 
nodes are data points that are not statistically significant. The blue and 
green numbers in each facet represent the number of transcripts that meet the 
criteria for blue and green nodes in each comparison. Left and right brackets 
(< and >) represent values that exceed the viewing area of the graph. Node 
size changes represent the magnitude of the LFC values (i.e. larger shapes 
reflect larger LFC values). Vertical and horizontal lines indicate 
user-defined LFC and adjusted *p*-values, respectively.” width=”100%”></p>



<p>Figure 32: <strong>A volcano plot matrix using the <code>vsVolcanoMatrix()</code> function with</strong><br><code>DESeq2</code> data. Similar to the <code>vsVolcano()</code> function, <code>vsVolcanoMatrix()</code> will generate a matrix of volcano plots for all comparisons within an experiment. Comparisons are made between the −log10−log10 <em>p</em>-value versus the log2log2 fold change (LFC) between two treatments. Blue nodes on the graph represent statistically significant LFCs which are greater than a given value than a user-defined LFC parameter. Green nodes indicate statistically significant LFCs which are less than the user-defined LFC parameter. Gray nodes are data points that are not statistically significant. The blue and green numbers in each facet represent the number of transcripts that meet the criteria for blue and green nodes in each comparison. Left and right brackets (< and >) represent values that exceed the viewing area of the graph. Node size changes represent the magnitude of the LFC values (i.e. larger shapes reflect larger LFC values). Vertical and horizontal lines indicate user-defined LFC and adjusted <em>p</em>-values, respectively.</p>



<h2>With edgeR</h2>



<pre class=vsVolcanoMatrix( data = df.edger, d.factor = NULL, type = 'edger', padj = 0.05, x.lim = NULL, lfc = NULL, title = TRUE, legend = TRUE, grid = TRUE, counts = TRUE )

      A volcano plot matrix using the `vsVolcanoMatrix()` function with 
`edgeR` data. Similar to the `vsVolcano()` function, `vsVolcanoMatrix()` 
will generate a matrix of volcano plots for all comparisons within an 
experiment. Comparisons are made between the $-log_{10}$ *p*-value versus the 
$log_2$ fold change (LFC) between two treatments. Blue nodes on the graph 
represent statistically significant LFCs which are greater than a given value 
than a user-defined LFC parameter. Green nodes indicate statistically 
significant LFCs which are less than the user-defined LFC parameter. Gray 
nodes are data points that are not statistically significant. The blue and 
green numbers in each facet represent the number of transcripts that meet the 
criteria for blue and green nodes in each comparison. Left and right brackets 
(< and >) represent values that exceed the viewing area of the graph. Node 
size changes represent the magnitude of the LFC values (i.e. larger shapes 
reflect larger LFC values). Vertical and horizontal lines indicate 
user-defined LFC and adjusted *p*-values, respectively.” width=”100%”></p>



<p>Figure 33: <strong>A volcano plot matrix using the <code>vsVolcanoMatrix()</code> function with</strong><br><code>edgeR</code> data. Similar to the <code>vsVolcano()</code> function, <code>vsVolcanoMatrix()</code> will generate a matrix of volcano plots for all comparisons within an experiment. Comparisons are made between the −log10−log10 <em>p</em>-value versus the log2log2 fold change (LFC) between two treatments. Blue nodes on the graph represent statistically significant LFCs which are greater than a given value than a user-defined LFC parameter. Green nodes indicate statistically significant LFCs which are less than the user-defined LFC parameter. Gray nodes are data points that are not statistically significant. The blue and green numbers in each facet represent the number of transcripts that meet the criteria for blue and green nodes in each comparison. Left and right brackets (< and >) represent values that exceed the viewing area of the graph. Node size changes represent the magnitude of the LFC values (i.e. larger shapes reflect larger LFC values). Vertical and horizontal lines indicate user-defined LFC and adjusted <em>p</em>-values, respectively.</p>



<h1>Example S10: Creating four way plots</h1>



<p>To create four-way plots, the function, <code>vsFourWay()</code> is used. This plot compares the log2log2 fold changes between two samples and a ‘control’. For more information on how each of the aesthetics are plotted, please refer to the figure captions and Method S1.</p>



<h2>With Cuffdiff</h2>



<pre class=vsFourWay( x = 'iPS', y = 'hESC', control = 'Fibroblasts', data = df.cuff, d.factor = NULL, type = 'cuffdiff', padj = 0.05, x.lim = NULL, y.lim = NULL, lfc = NULL, legend = TRUE, title = TRUE, grid = TRUE )

      A four way plot visualization using the `vsFourWay()` function with 
`Cuffdiff` data. In this example, LFCs comparisons between two treatments and
a control are made. Blue nodes indicate statistically significant LFCs which 
are greater than a given user-defined value for both x and y-axes. Green nodes 
reflect statistically significant LFCs which are less than a user-defined 
value for treatment y and greater than said value for treatment x. Similar to 
green nodes, red nodes reflect statistically significant LFCs which are 
greater than a user-defined vlaue treatment y and less than said value for 
treatment x. Gray nodes are data points that are not statistically significant 
for both x and y-axes. Triangular shapes indicate values which exceed the 
viewing are for the graph. Size change reflects the magnitude of LFC values (
i.e. larger shapes reflect larger LFC values). Vertical and horizontal dashed 
lines indicate user-defined LFC values.

      Figure 34: A four way plot visualization using the vsFourWay() function with
      Cuffdiff data. In this example, LFCs comparisons between two treatments and a control are made. Blue nodes indicate statistically significant LFCs which are greater than a given user-defined value for both x and y-axes. Green nodes reflect statistically significant LFCs which are less than a user-defined value for treatment y and greater than said value for treatment x. Similar to green nodes, red nodes reflect statistically significant LFCs which are greater than a user-defined vlaue treatment y and less than said value for treatment x. Gray nodes are data points that are not statistically significant for both x and y-axes. Triangular shapes indicate values which exceed the viewing are for the graph. Size change reflects the magnitude of LFC values ( i.e. larger shapes reflect larger LFC values). Vertical and horizontal dashed lines indicate user-defined LFC values.

      With DESeq2

      vsFourWay(
          x = 'treated_paired.end', y = 'untreated_single.read', 
          control = 'untreated_paired.end', data = df.deseq, 
          d.factor = 'condition', type = 'deseq', padj = 0.05, x.lim = NULL, 
          y.lim = NULL, lfc = NULL, legend = TRUE, title = TRUE, grid = TRUE
      )

      A four way plot visualization using the `vsFourWay()` function with 
`DESeq2` data. In this example, LFCs comparisons between two treatments and a 
control are made. Blue nodes indicate statistically significant LFCs which are 
greater than a given user-defined value for both x and y-axes. Green nodes 
reflect statistically significant LFCs which are less than a user-defined 
value for treatment y and greater than said value for treatment x. Similar to 
green nodes, red nodes reflect statistically significant LFCs which are 
greater than a user-defined vlaue treatment y and less than said value for 
treatment x. Gray nodes are data points that are not statistically significant 
for both x and y-axes. Triangular shapes indicate values which exceed the 
viewing are for the graph. Size change reflects the magnitude of LFC values (
i.e. larger shapes reflect larger LFC values). Vertical and horizontal dashed 
lines indicate user-defined LFC values.

      Figure 35: A four way plot visualization using the vsFourWay() function with
      DESeq2 data. In this example, LFCs comparisons between two treatments and a control are made. Blue nodes indicate statistically significant LFCs which are greater than a given user-defined value for both x and y-axes. Green nodes reflect statistically significant LFCs which are less than a user-defined value for treatment y and greater than said value for treatment x. Similar to green nodes, red nodes reflect statistically significant LFCs which are greater than a user-defined vlaue treatment y and less than said value for treatment x. Gray nodes are data points that are not statistically significant for both x and y-axes. Triangular shapes indicate values which exceed the viewing are for the graph. Size change reflects the magnitude of LFC values ( i.e. larger shapes reflect larger LFC values). Vertical and horizontal dashed lines indicate user-defined LFC values.

      With edgeR

      vsFourWay(
          x = 'WW', y = 'WM', control = 'MM', data = df.edger,
          d.factor = NULL, type = 'edger', padj = 0.05, x.lim = NULL,
          y.lim = NULL, lfc = NULL, legend = TRUE, title = TRUE, grid = TRUE
      )

      A four way plot visualization using the `vsFourWay()` function with 
`DESeq2` data. In this example, LFCs comparisons between two treatments and a 
control are made. Blue nodes indicate statistically significant LFCs which are 
greater than a given user-defined value for both x and y-axes. Green nodes 
reflect statistically significant LFCs which are less than a user-defined 
value for treatment y and greater than said value for treatment x. Similar to 
green nodes, red nodes reflect statistically significant LFCs which are 
greater than a user-defined vlaue treatment y and less than said value for 
treatment x. Gray nodes are data points that are not statistically significant 
for both x and y-axes. Triangular shapes indicate values which exceed the 
viewing are for the graph. Size change reflects the magnitude of LFC values (
i.e. larger shapes reflect larger LFC values). Vertical and horizontal dashed 
lines indicate user-defined LFC values.

      Figure 36: A four way plot visualization using the vsFourWay() function with
      DESeq2 data. In this example, LFCs comparisons between two treatments and a control are made. Blue nodes indicate statistically significant LFCs which are greater than a given user-defined value for both x and y-axes. Green nodes reflect statistically significant LFCs which are less than a user-defined value for treatment y and greater than said value for treatment x. Similar to green nodes, red nodes reflect statistically significant LFCs which are greater than a user-defined vlaue treatment y and less than said value for treatment x. Gray nodes are data points that are not statistically significant for both x and y-axes. Triangular shapes indicate values which exceed the viewing are for the graph. Size change reflects the magnitude of LFC values ( i.e. larger shapes reflect larger LFC values). Vertical and horizontal dashed lines indicate user-defined LFC values.

      Example S11: Highlighting data points

      Overview

      For point-based plots, users can highlight IDs of interest (i.e. genes, transcripts, etc.). Currently, this functionality is implemented in the following functions:

      • vsScatterPlot()
      • vsMAPlot()
      • vsVolcano()
      • vsFourWay()

      To use this feature, simply provide a vector of specified IDs to the highlight parameter found in the prior functions. An example of a typical vector would be as follows:

      important_ids <- c(
        "ID_001",
        "ID_002",
        "ID_003",
        "ID_004",
        "ID_005"
      )
      important_ids
      ## [1] "ID_001" "ID_002" "ID_003" "ID_004" "ID_005"

      For specific examples using the toy data set, please see the proceeding 4 sub-sections.

      Highlighting with vsScatterPlot()

      data("df.cuff")
      hl <- c(
        "XLOC_000033",
        "XLOC_000099",
        "XLOC_001414",
        "XLOC_001409"
      )
      vsScatterPlot(
          x = "hESC", y = "iPS", data = df.cuff, d.factor = NULL,
          type = "cuffdiff", title = TRUE, grid = TRUE, highlight = hl
      )

      Highlighting with `vsScatterPlot()`. IDs of interest can be 
identified within basic scatter plots. When highlighted, non-important points
will turn grey while highlighted points will turn blue. Text tags will *try*
to optimize their location within the graph without trying to overlap each
other.

      Figure 37: Highlighting with vsScatterPlot()
      IDs of interest can be identified within basic scatter plots. When highlighted, non-important points will turn grey while highlighted points will turn blue. Text tags will try to optimize their location within the graph without trying to overlap each other.

      Highlighting with vsMAPlot()

      hl <- c(
        "FBgn0022201",
        "FBgn0003042",
        "FBgn0031957",
        "FBgn0033853",
        "FBgn0003371"
      )
      vsMAPlot(
          x = "treated_paired.end", y = "untreated_paired.end",
          data = df.deseq, d.factor = "condition", type = "deseq",
          padj = 0.05, y.lim = NULL, lfc = NULL, title = TRUE,
          legend = TRUE, grid = TRUE, data.return = FALSE, highlight = hl
      )

      Highlighting with `vsMAPlot()`. IDs of interest can be 
identified within MA plots. When highlighted, non-important points
will decrease in transparency (i.e. lower alpha values) while highlighted 
points will turn red. Text tags will *try* to optimize their location within 
the graph without trying to overlap each other.

      Figure 38: Highlighting with vsMAPlot()
      IDs of interest can be identified within MA plots. When highlighted, non-important points will decrease in transparency (i.e. lower alpha values) while highlighted points will turn red. Text tags will try to optimize their location within the graph without trying to overlap each other.

      Highlighting with vsVolcano()

      hl <- c(
        "FBgn0036248",
        "FBgn0026573",
        "FBgn0259742",
        "FBgn0038961",
        "FBgn0038928"
      )
      vsVolcano(
          x = "treated_paired.end", y = "untreated_paired.end",
          data = df.deseq, d.factor = "condition",
          type = "deseq", padj = 0.05, x.lim = NULL, lfc = NULL,
          title = TRUE, grid = TRUE, data.return = FALSE, highlight = hl
      )

      Highlighting with `vsVolcano()`. IDs of interest can be 
identified within volcano plots. When highlighted, non-important points
will decrease in transparency (i.e. lower alpha values) while highlighted 
points will turn red. Text tags will *try* to optimize their location within 
the graph without trying to overlap each other.

      Figure 39: Highlighting with vsVolcano()
      IDs of interest can be identified within volcano plots. When highlighted, non-important points will decrease in transparency (i.e. lower alpha values) while highlighted points will turn red. Text tags will try to optimize their location within the graph without trying to overlap each other.

      Highlighting with vsFourWay()

      data("df.edger")
      hl <- c(
          "ID_639",
          "ID_518",
          "ID_602",
          "ID_449",
          "ID_076"
      )
      vsFourWay(
          x = "WM", y = "WW", control = "MM", data = df.edger,
          d.factor = NULL, type = "edger", padj = 0.05, x.lim = NULL,
          y.lim = NULL, lfc = 2, title = TRUE, grid = TRUE,
          data.return = FALSE, highlight = hl
      )

      Highlighting with `vsFourWay()`. IDs of interest can be 
identified within four-way plots. When highlighted, non-important points
will decrease in transparency (i.e. lower alpha values) while highlighted 
points will turn dark grey. Text tags will *try* to optimize their location 
within the graph without trying to overlap each other.

      Figure 40: Highlighting with vsFourWay()
      IDs of interest can be identified within four-way plots. When highlighted, non-important points will decrease in transparency (i.e. lower alpha values) while highlighted points will turn dark grey. Text tags will try to optimize their location within the graph without trying to overlap each other.

      Example S12: Extracting datasets from plots

      Overview

      For all plots, users can extract datasets used for the visualizations. You may want to pursue this option if you want to use a highly customized plot script or you would like to perform some unmentioned analysis, for example.

      To use this this feature, set the data.return parameter in the function you are using to TRUE. You will also need to assign the function to an object. See the following example for further details.

      The data extraction process

      In this example, we will use the toy data set df.cuff, a cuffdiff output on the function vsScatterPlot(). Take note that we are assigning the function to an object tmp:

      # Extract data frame from visualization
      data("df.cuff")
      tmp <- vsScatterPlot(
         x = "hESC", y = "iPS", data = df.cuff, d.factor = NULL,
         type = "cuffdiff", title = TRUE, grid = TRUE, data.return = TRUE
      )

      The object we have created is a list with two elements: data and plot. To extract the data, we can call the first element of the list using the subset method (<object>[[1]]) or by invoking its element name (<object>$data):

      df_scatter <- tmp[[1]] ## or use tmp$data
      head(df_scatter)
      ##            id           x         y
      ## 1 XLOC_000001 3.47386e-01  20.21750
      ## 2 XLOC_000002 0.00000e+00   0.00000
      ## 3 XLOC_000003 0.00000e+00   0.00000
      ## 4 XLOC_000004 6.97259e+05   0.00000
      ## 5 XLOC_000005 6.96704e+02 355.82300
      ## 6 XLOC_000006 0.00000e+00   1.51396

      Return the plot

      By assigning each of these functions to a list, we can also store the plot as another element. To extract the plot, we can call the second element of the list using the aformentioned procedures:

      my_plot <- tmp[[2]] ## or use tmp$plot
      my_plot

      Example S13: Changing text sizes

      Overview

      For all functions, users can modify the font size of multiple portions of the plot. These portions primarily revolve around these components:

      • Axis text and titles
      • Plot title
      • Legend text and titles
      • Facet titles

      To manipulate these components, users can modify the default values of the following parameters:

      • xaxis.text.size
      • yaxis.text.size
      • xaxis.title.size
      • yaxis.title.size
      • main.title.size
      • legend.text.size
      • legend.title.size
      • facet.title.size

      What exactly can you manipulate?

      Each of parameters mentioned in the prior section refer to numerical values. These values correlate to font size in typographic points. To illustrate what exactly these parameters modify, please refer to the following figure:A visual guide to text size parameters. Users can modify these
components which are highlighted by their respective parameter.

      Figure 41: A visual guide to text size parameters
      Users can modify these components which are highlighted by their respective parameter.

      The facet.title.size parameter refers to the facets which are allocated in the matrix functions (vsScatterMatrix(), vsMAMatrix(), vsVolcanoMatrix()). This is illustrated in the following figure:Location of facet titles. Facet title sizes can be modified using
the `facet.title.size` parameter.

      Figure 42: Location of facet titles
      Facet title sizes can be modified using the facet.title.size parameter.

      Since not all functions are equal in their parameters and component layout, some functions will either have or lack some of the prior parameters. To get an idea of which have functions have which, please refer to the following figure:An overview of text size parameters for each function. Cells 
highlighted in red refer to parameters (columns) which are found in their
respective functions (rows). Cells which are grey indicate parameters which
are not found in each of the functions.

      Figure 43: An overview of text size parameters for each function
      Cells highlighted in red refer to parameters (columns) which are found in their respective functions (rows). Cells which are grey indicate parameters which are not found in each of the functions.

      Method S1: Determining data point shape and size changes

      The shape and size of each data point will also change depending on several conditions. To maximize the viewing area while retaining high resolution, some data points will not be present within the viewing area. If they exceed the viewing area, they will change shape from a circle to a triangular orientation.

      The extent (i.e. fold change) to how far these points exceed the viewing area are based on the following criteria:

      • SUB – values that fall within the viewing area of the plot.
      • T-1 – values that are greater than the maximum viewing area and are less than the 25th percentile of values that exceed the viewing area.
      • T-2 – Similar to T-1; values fall between the 25th and 50th percentile.
      • T-3 – Similar to T-2; values fall between the 50th and 75th percentile.
      • T-4 – Similar to T-3; values fall between the 75th and 100th percentile.

      To further clarify theses conditions, please refer to the following figure:An illustration detailing the principles behind the node size for 
the differntial gene expression functions. In this figure, the data points 
increase in size depending on which quartile they reside as the absolute LFC 
increases (top bar). Data points that fall within the viewing area classified 
as SUB while data points that exceed this area are classified as T-1 through 
T-4.

      Figure 44: An illustration detailing the principles behind the node size for
      the differntial gene expression functions. In this figure, the data points increase in size depending on which quartile they reside as the absolute LFC increases (top bar). Data points that fall within the viewing area classified as SUB while data points that exceed this area are classified as T-1 through T-4.

      Method S2: Determining function performance

      Function efficiencies were determined by calculating system times by using the microbenchmark R package. Each function was ran 100 times with the prior code used in the documentation. All benchmarks were determined on a machine running a 64-bit Windows 10 operating system, 8 GB of RAM, and an Intel Core i5-6400 processor running at 2.7 GHz.

      Scatterplots

      Benchmarks for the `vsScatterPlot()` function. Time (ms) 
distributions were generated for this function using 100 trials for each of
the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example data sets 
contained 1200, 724, and 29391 transcripts, respectively.

      Figure 45: Benchmarks for the vsScatterPlot() function
      Time (ms) distributions were generated for this function using 100 trials for each of the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example data sets contained 1200, 724, and 29391 transcripts, respectively.

      Scatterplot matrices

      Benchmarks for the `vsScatterMatrix()` function. Time (ms) 
distributions were generated for this function using 100 trials for each of 
the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example data sets 
contained 1200, 724, and 29391 transcripts, respectively.

      Figure 46: Benchmarks for the vsScatterMatrix() function
      Time (ms) distributions were generated for this function using 100 trials for each of the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example data sets contained 1200, 724, and 29391 transcripts, respectively.

      Box plots

      Benchmarks for the `vsBoxPlot()` function. Time (ms) 
distributions were generated for this function using 100 trials for each of 
the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example data sets 
contained 1200, 724, and 29391 transcripts, respectively.

      Figure 47: Benchmarks for the vsBoxPlot() function
      Time (ms) distributions were generated for this function using 100 trials for each of the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example data sets contained 1200, 724, and 29391 transcripts, respectively.

      Differential gene expression matrices

      Benchmarks for the `vsDEGMatrix()` function. Time (ms) 
distributions were generated for this function using 100 trials for each of 
the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example data sets 
contained 1200, 724, and 29391 transcripts, respectively.

      Figure 48: Benchmarks for the vsDEGMatrix() function
      Time (ms) distributions were generated for this function using 100 trials for each of the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example data sets contained 1200, 724, and 29391 transcripts, respectively.

      Volcano plots

      Benchmarks for the `vsVolcano()` function. Time (ms) 
distributions were generated for this function using 100 trials for each of 
the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example data sets 
contained 1200, 724, and 29391 transcripts, respectively.

      Figure 49: Benchmarks for the vsVolcano() function
      Time (ms) distributions were generated for this function using 100 trials for each of the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example data sets contained 1200, 724, and 29391 transcripts, respectively.

      Volcano plot matrices

      Benchmarks for the `vsVolcanoMatrix()` function. Time (ms) 
distributions were generated for this function using 100 trials for each of 
the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example data sets 
contained 1200, 724, and 29391 transcripts, respectively.

      Figure 50: Benchmarks for the vsVolcanoMatrix() function
      Time (ms) distributions were generated for this function using 100 trials for each of the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example data sets contained 1200, 724, and 29391 transcripts, respectively.

      MA plots

      Benchmarks for the `vsMAPlot()` function. Time (ms) 
distributions were generated for this function using 100 trials for each of 
the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example data sets 
contained 1200, 724, and 29391 transcripts, respectively.

      Figure 51: Benchmarks for the vsMAPlot() function
      Time (ms) distributions were generated for this function using 100 trials for each of the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example data sets contained 1200, 724, and 29391 transcripts, respectively.

      MA matrices

      Benchmarks for the `vsMAMatrix()` function. Time (s) 
distributions were generated for this function using 100 trials for each of 
the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example data sets 
contained 1200, 724, and 29391 transcripts, respectively.

      Figure 52: Benchmarks for the vsMAMatrix() function
      Time (s) distributions were generated for this function using 100 trials for each of the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example data sets contained 1200, 724, and 29391 transcripts, respectively.

      Four way plots

      Benchmarks for the `vsFourWay()` function. Time (ms) 
distributions were generated for this function using 100 trials for each of 
the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example data sets 
contained 1200, 724, and 29391 transcripts, respectively.

      Figure 53: Benchmarks for the vsFourWay() function
      Time (ms) distributions were generated for this function using 100 trials for each of the three RNAseq data objects. Cuffdiff, DESeq2, and edgeR example data sets contained 1200, 724, and 29391 transcripts, respectively.

      Session info

      ## R version 4.1.0 (2021-05-18)
      ## Platform: x86_64-pc-linux-gnu (64-bit)
      ## Running under: Ubuntu 20.04.2 LTS
      ## 
      ## Matrix products: default
      ## BLAS:   /home/biocbuild/bbs-3.13-bioc/R/lib/libRblas.so
      ## LAPACK: /home/biocbuild/bbs-3.13-bioc/R/lib/libRlapack.so
      ## 
      ## locale:
      ##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
      ##  [3] LC_TIME=en_GB              LC_COLLATE=C              
      ##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
      ##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
      ##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
      ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
      ## 
      ## attached base packages:
      ## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
      ## [8] methods   base     
      ## 
      ## other attached packages:
      ##  [1] edgeR_3.34.0                limma_3.48.0               
      ##  [3] DESeq2_1.32.0               SummarizedExperiment_1.22.0
      ##  [5] Biobase_2.52.0              MatrixGenerics_1.4.0       
      ##  [7] matrixStats_0.58.0          GenomicRanges_1.44.0       
      ##  [9] GenomeInfoDb_1.28.0         IRanges_2.26.0             
      ## [11] S4Vectors_0.30.0            BiocGenerics_0.38.0        
      ## [13] vidger_1.12.0               BiocStyle_2.20.0           
      ## 
      ## loaded via a namespace (and not attached):
      ##  [1] bitops_1.0-7           bit64_4.0.5            RColorBrewer_1.1-2    
      ##  [4] httr_1.4.2             tools_4.1.0            bslib_0.2.5.1         
      ##  [7] utf8_1.2.1             R6_2.5.0               DBI_1.1.1             
      ## [10] colorspace_2.0-1       withr_2.4.2            tidyselect_1.1.1      
      ## [13] GGally_2.1.1           bit_4.0.4              compiler_4.1.0        
      ## [16] DelayedArray_0.18.0    labeling_0.4.2         bookdown_0.22         
      ## [19] sass_0.4.0             scales_1.1.1           genefilter_1.74.0     
      ## [22] stringr_1.4.0          digest_0.6.27          rmarkdown_2.8         
      ## [25] XVector_0.32.0         pkgconfig_2.0.3        htmltools_0.5.1.1     
      ## [28] highr_0.9              fastmap_1.1.0          rlang_0.4.11          
      ## [31] RSQLite_2.2.7          farver_2.1.0           jquerylib_0.1.4       
      ## [34] generics_0.1.0         jsonlite_1.7.2         BiocParallel_1.26.0   
      ## [37] dplyr_1.0.6            RCurl_1.98-1.3         magrittr_2.0.1        
      ## [40] GenomeInfoDbData_1.2.6 Matrix_1.3-3           Rcpp_1.0.6            
      ## [43] munsell_0.5.0          fansi_0.4.2            lifecycle_1.0.0       
      ## [46] stringi_1.6.2          yaml_2.2.1             zlibbioc_1.38.0       
      ## [49] plyr_1.8.6             grid_4.1.0             blob_1.2.1            
      ## [52] ggrepel_0.9.1          crayon_1.4.1           lattice_0.20-44       
      ## [55] Biostrings_2.60.0      splines_4.1.0          annotate_1.70.0       
      ## [58] KEGGREST_1.32.0        magick_2.7.2           locfit_1.5-9.4        
      ## [61] knitr_1.33             pillar_1.6.1           geneplotter_1.70.0    
      ## [64] XML_3.99-0.6           glue_1.4.2             evaluate_0.14         
      ## [67] BiocManager_1.30.15    png_0.1-7              vctrs_0.3.8           
      ## [70] tidyr_1.1.3            gtable_0.3.0           purrr_0.3.4           
      ## [73] reshape_0.8.8          assertthat_0.2.1       cachem_1.0.5          
      ## [76] ggplot2_3.3.3          xfun_0.23              xtable_1.8-4          
      ## [79] survival_3.2-11        tibble_3.1.2           AnnotationDbi_1.54.0  
      ## [82] memoise_2.0.0          ellipsis_0.3.2
      R语言入群

      R语言入群

      R语言群,不知怎么的入群的人很少啊,请大家支持呐--

      请关注“恒诺新知”微信公众号,感谢“R语言“,”数据那些事儿“,”老俊俊的生信笔记“,”冷🈚️思“,“珞珈R”,“生信星球”的支持!

      • 分享:
      作者头像
      weinfoeditor

      上一篇文章

      【RNA-Seq数据转化小技巧】使用countToFPKM包轻松完成counts到FPKM转化
      2021年9月14日

      下一篇文章

      677
      2021年9月14日

      你可能也喜欢

      articleheader
      【R数据挖掘】TCGA的拷贝数变异
      15 9月, 2021
      articleheader
      【RNA-Seq数据转化小技巧】使用countToFPKM包轻松完成counts到FPKM转化
      13 9月, 2021
      articleheader与公众号
      【R工具篇】不慌张!手把手配置R/vscode解救Rstudio卡顿痛点
      11 9月, 2021

      留言 取消回复

      要发表评论,您必须先登录。

      搜索

      分类

      • R语言
      • TCGA数据挖掘
      • 单细胞RNA-seq测序
      • 在线会议直播预告与回放
      • 数据分析那些事儿分类
      • 未分类
      • 生信星球
      • 老俊俊的生信笔记

      投稿培训

      免费

      alphafold2培训

      免费

      群晖配置培训

      免费

      最新博文

      Nature | 单细胞技术揭示衰老细胞与肌肉再生
      301月2023
      lncRNA和miRNA生信分析系列讲座免费视频课和课件资源包,干货满满
      301月2023
      如何快速批量修改 Git 提交记录中的用户信息
      261月2023
      logo-eduma-the-best-lms-wordpress-theme

      (00) 123 456 789

      weinfoadmin@weinformatics.cn

      恒诺新知

      • 关于我们
      • 博客
      • 联系
      • 成为一名讲师

      链接

      • 课程
      • 事件
      • 展示
      • 问答

      支持

      • 文档
      • 论坛
      • 语言包
      • 发行状态

      推荐

      • iHub汉语代码托管
      • iLAB耗材管理
      • WooCommerce
      • 丁香园论坛

      weinformatics 即 恒诺新知。ICP备案号:粤ICP备19129767号

      • 关于我们
      • 博客
      • 联系
      • 成为一名讲师

      要成为一名讲师吗?

      加入数以千计的演讲者获得100%课时费!

      现在开始

      用你的站点账户登录

      忘记密码?

      还不是会员? 现在注册

      注册新帐户

      已经拥有注册账户? 现在登录

      close
      会员购买 你还没有登录,请先登录
      • ¥99 VIP-1个月
      • ¥199 VIP-半年
      • ¥299 VIP-1年
      在线支付 激活码

      立即支付
      支付宝
      微信支付
      请使用 支付宝 或 微信 扫码支付
      登录
      注册|忘记密码?