• 主页
  • 课程

    关于课程

    • 课程归档
    • 成为一名讲师
    • 讲师信息
    同等学历教学

    同等学历教学

    免费
    阅读更多
  • 特色
    • 展示
    • 关于我们
    • 问答
  • 事件
  • 个性化
  • 博客
  • 联系
  • 站点资源
    有任何问题吗?
    (00) 123 456 789
    weinfoadmin@weinformatics.cn
    注册登录
    恒诺新知
    • 主页
    • 课程

      关于课程

      • 课程归档
      • 成为一名讲师
      • 讲师信息
      同等学历教学

      同等学历教学

      免费
      阅读更多
    • 特色
      • 展示
      • 关于我们
      • 问答
    • 事件
    • 个性化
    • 博客
    • 联系
    • 站点资源

      单细胞RNA-seq测序

      • 首页
      • 博客
      • 单细胞RNA-seq测序
      • 5 scRNA-seq Analysis with Bioconductor

      5 scRNA-seq Analysis with Bioconductor

      • 发布者 xu, xintian
      • 分类 单细胞RNA-seq测序, 未分类
      • 日期 2021年9月9日
      • 评论 0评论

      专题介绍:单细胞RNA-seq被评为2018年重大科研进展,但实际上这是老技术。2015年,商品化单细胞RNA测序流程已经建立,成果发表在Cell上。今年井喷式发文章,关注点那么高,是因为最近这项技术全面商品化了。

      QUESTIONS

      • How can I import single-cell data into R?
      • How are different types of data/information (e.g. cell information, gene information, etc.) stored and manipulated?
      • How can I obtain basic summary metrics for cells and genes and filter the data accordingly?
      • How can I visually explore these metrics?

      LEARNING OBJECTIVES

      • Understand how single-cell data is stored in the Bioconductor SingleCellExperiment object.
      • Create a SingleCellExperiment object from processed scRNA-seq count data.
      • Access the different parts of a SingleCellExperiment object, such as rowData, colData and assay.
      • Obtain several summary metrics from a matrix, to summarise information across cells or genes.
      • Apply basic filters to the data, by constructing logical vectors and subset the object using the [ operator.
      • Produce basic data visualisations directly fron the data stored in the SingleCellExperiment object.

      In this chapter we will start our practical introduction of the core packages used in our analysis.

      We will use a dataset of induced pluripotent stem cells generated from three different individuals (Tung et al. 2017) in Yoav Gilad’s lab at the University of Chicago. The experiments were carried out on the Fluidigm C1 platform using unique molecular identifiers (UMIs) for quantification. The data files are located in the data/tung folder in your working directory. The original file can be found on the public NCBI repository GEO accession GSE77288 (file named: GSE77288_molecules-raw-single-per-sample.txt.gz).

      NOTE:

      A couple of things that are missing from these materials:

      • sparse matrix. Because it’s using the tung dataset, we just get a regular matrix. If we decide to use a 10x dataset instead, it might be easier to introduce that as well.
      • rowData – I didn’t include any gene annotation, although we could easily include that as well, for example with information about which genes are nuclear or mitochondrial.

      5.1 Packages for scRNA-seq Analysis

      There are several possible software packages (or package “ecosystems”) that can be used for single-cell analysis. In this course we’re going to focus on a collection of packages that are part of the Bioconductor project.

      Bioconductor is a repository of R packages specifically developed for biological analyses. It has an excellent collection of packages for scRNA-seq analysis, which are summarised in the Orchestrating Single-Cell Analysis with Bioconductor book. The advantage of Bioconductor is that it has strict requirements for package submission, including installation on every platform and full documentation with a tutorial (called a vignette) explaining how the package should be used. Bioconductor also encourages utilization of standard data structures/classes and coding style/naming conventions, so that, in theory, packages and analyses can be combined into large pipelines or workflows. For scRNA-seq specifically, the standard data object used is called SingleCellExperiment, which we will learn more about in this section.

      Seurat is another popular R package that uses its own data object called Seurat. The Seurat package includes a very large collection of functions and excellent documentation. Because of its popularity, several other packages nowadays are also compatible with the Seurat object. Although not the main focus of this course, we have a section illustrating an analysis workflow using this package: Analysis of scRNA-seq with Seurat.

      Scanpy is a popular python package for scRNA-seq analysis, which stores data in an object called AnnData (annotated data). Similarly to Seurat and Bioconductor, developers can write extensions to the main package compatible with the AnnData package, allowing the community to expand on the functionality.

      Although our main focus is on Bioconductor packages, the concepts we will learn about throughout this course should apply to the other alternatives, the main thing that changes is the exact syntax used.

      5.2 The SingleCellExperiment Object

      Expression data is usually stored as a feature-by-sample matrix of expression quantification. In scRNA-seq analysis we typically start our analysis from a matrix of counts, representing the number of reads/UMIs that aligned to a particular feature for each cell. Features can be things like genes, isoforms or exons. Usually, analyses are done at the gene-level, and that is what we will focus on in this course.

      Besides our quantification matrix, we may also have information about each gene (e.g. their genome location, which type of gene they are, their length, etc.) and information about our cells (e.g. their tissue of origin, patient donor, processing batch, disease status, treatment exposure, etc.).

      We may also produce other matrices from our raw count data, for example a matrix of normalised counts. And finally, because single-cell experiment data is very high dimensional (with thousands of cells and thousands of genes), we often employ dimensionality reduction techniques to capture the main variation in the data at lower dimensions.

      SingleCellExperiment (SCE for short) is an object that stores all this information in a synchronised manner. The different parts of the object are referred to as “slots”:

      • One or more matrices of expression – stored in the assay slots.
      • Information about the genes (the rows of the object) – stored in the rowData slot.
      • Information about the cells (the columns of the object) – stored in the colData slot.

      The SingleCellExperiment object. Features (e.g. genes, isoforms or exons) are stored as rows and their metadata is in a rowData slot. Cells are stored as columns, with their metadata in colData slot. Matrices of expression are stored in the assay slots. Dimensionality reduction projections of the cells are stored in the reducedDim slots.

      5.2.1 Creating SCE Objects

      Let’s start by creating a SingleCellExperiment object from our data. We have two files in data/tung:

      • counts.txt – a tab-delimited text file with the gene counts for each gene/cell.
      • annotation.txt – a tab-delimited text file with the cell annotations.

      Let’s read these into R, using the standard read.table() function:

      tung_counts <- read.table("data/tung/molecules.txt", sep = "\t")
      tung_annotation <- read.table("data/tung/annotation.txt", sep = "\t", header = TRUE)

      We can now create a SingleCellExperiment object using the function of the same name:

      # load the library
      library(SingleCellExperiment)
      
      # note that the data passed to the assay slot has to be a matrix!
      tung <- SingleCellExperiment(
        assays = list(counts = as.matrix(tung_counts)),
        colData = tung_annotation
      )
      
      # remove the original tables as we don't need them anymore
      rm(tung_counts, tung_annotation)

      If we print the contents of this object, we will get several useful pieces of information:

      tung
      ## class: SingleCellExperiment 
      ## dim: 19027 864 
      ## metadata(0):
      ## assays(1): counts
      ## rownames(19027): ENSG00000237683 ENSG00000187634 ... ERCC-00170
      ##   ERCC-00171
      ## rowData names(0):
      ## colnames(864): NA19098.r1.A01 NA19098.r1.A02 ... NA19239.r3.H11
      ##   NA19239.r3.H12
      ## colData names(5): individual replicate well batch sample_id
      ## reducedDimNames(0):
      ## mainExpName: NULL
      ## altExpNames(0):
      • We have 20327 genes (rows) and 864 cells (columns).
      • There is a single assay named “counts.”
      • We can preview some of the gene names (“rownames”) and cell names (“colnames”).
      • There is no gene metadata (“rowData” is empty).
      • We can see some of the metadata for cells is (“colData names”).

      To access different parts of the SCE object, we can use the following accessor functions:

      FunctionDescription
      rowData(sce)Table of gene metadata.
      colData(sce)Table of cell metadata.
      assay(sce, "counts")The assay named “counts.”
      reducedDim(sce, "PCA")The reduced dimensionality table named “PCA”
      sce$colnameShortcut to access the colum “colname” from colData. This is equivalent to colData(sce)$colname
      sce[<rows>, <columns>]We can use the square brackets to subset the SCE object by rows or columns, similarly to how we subset matrix or data.frame objects

      Naming Assays

      Assays can have any name we wish. However, there are some conventions we can follow:

      • counts: Raw count data, e.g. number of reads or transcripts for a particular gene.
      • normcounts: Normalized values on the same scale as the original counts. For example, counts divided by cell-specific size factors that are centred at unity.
      • logcounts: Log-transformed counts or count-like values. In most cases, this will be defined as log-transformed normcounts, e.g. using log base 2 and a pseudo-count of 1.
      • cpm: Counts-per-million. This is the read count for each gene in each cell, divided by the library size of each cell in millions.
      • tpm: Transcripts-per-million. This is the number of transcripts for each gene in each cell, divided by the total number of transcripts in that cell (in millions).

      Each of these has a function, so that we can access the “counts” assay using the counts() function. Therefore, these two are equivalent:

      counts(tung)
      assay(tung, "counts")

      Creating SingleCellExperiment objects like we did above should work for any use case, as long as we have a matrix of counts that we can read to a file. However, to read the output of the popular tool cellranger (used to quantify 10x Chromium data), there is a dedicated function in the DropletUtils package, which simplifies the process of importing the data. Here is an example usage:

      library(DropletUtils)
      
      # importing the raw count data
      sce <- read10xCounts("data/pbmc_1k_raw")
      
      # importing the pre-filtered count data
      sce <- read10xCounts("data/pbmc_1k_filtered")

      Exercise 1

      1. What are the classes of the “colData” and “assay” slots of our tung object?Hint
      2. How many batches and cells per batch are there? Does that number make sense?Hint

      Answer

      5.2.2 Modifying SCE Objects

      To modify parts of our SCE object we can use the <- assignment operator, together with the part of the object we wish to modify. For example, to create a new assay: assay(sce, "name_of_new_assay") <- new_matrix. Other use cases are summarised in the table below.

      As an example, let’s create a simple transformation of our count data, by taking the log base 2 with an added pseudocount of 1 (otherwise log(0) = -Inf):

      assay(tung, "logcounts") <- log2(counts(tung) + 1)

      Because we named our assay “logcounts,” and this is one of the conventional assay names, we can use the dedicated function to access it:

      # first 10 rows and 4 columns of the logcounts assay
      logcounts(tung)[1:10, 1:4] # or: assay(tung, "logcounts")[1:10, 1:5]
      ##                 NA19098.r1.A01 NA19098.r1.A02 NA19098.r1.A03 NA19098.r1.A04
      ## ENSG00000237683       0.000000       0.000000       0.000000              1
      ## ENSG00000187634       0.000000       0.000000       0.000000              0
      ## ENSG00000188976       2.000000       2.807355       1.000000              2
      ## ENSG00000187961       0.000000       0.000000       0.000000              0
      ## ENSG00000187583       0.000000       0.000000       0.000000              0
      ## ENSG00000187642       0.000000       0.000000       0.000000              0
      ## ENSG00000188290       0.000000       0.000000       0.000000              0
      ## ENSG00000187608       1.000000       3.000000       0.000000              2
      ## ENSG00000188157       1.584963       2.000000       1.584963              2
      ## ENSG00000237330       0.000000       0.000000       0.000000              0

      Here is a summary of other ways to modify data in SCE objects:

      CodeDescription
      assay(sce, "name") <- matrixAdd a new assay matrix. The new matrix has to have matching rownames and colnames to the existing object.
      rowData(sce) <- data_frameReplace rowData with a new table (or add one if it does not exist).
      colData(sce) <- data_frameReplace colData with a new table (or add one if it does not exist).
      colData(sce)$column_name <- valuesAdd a new column to the colData table (or replace it if it already exists).
      rowData(sce)$column_name <- valuesAdd a new column to the rowData table (or replace it if it already exists).
      reducedDim(sce, "name") <- matrixAdd a new dimensionality reduction matrix. The new matrix has to have matching colnames to the existing object.

      5.2.3 Matrix Statistics

      Because the main data stored in SingleCellExperiment objects is a matrix, it is useful to cover some functions that calculate summary metrics across rows or columns of a matrix. There are several functions to do this, detailed in the information box below.

      For example, to calculate the mean counts per cell (columns) in our dataset:

      colMeans(counts(tung))

      We could add this information to our column metadata as a new column, which we could do as:

      colData(tung)$mean_counts <- colMeans(counts(tung))

      If we look at the colData slot we can see the new column has been added:

      colData(tung)
      ## DataFrame with 864 rows and 6 columns
      ##                 individual   replicate        well       batch      sample_id
      ##                <character> <character> <character> <character>    <character>
      ## NA19098.r1.A01     NA19098          r1         A01  NA19098.r1 NA19098.r1.A01
      ## NA19098.r1.A02     NA19098          r1         A02  NA19098.r1 NA19098.r1.A02
      ## NA19098.r1.A03     NA19098          r1         A03  NA19098.r1 NA19098.r1.A03
      ## NA19098.r1.A04     NA19098          r1         A04  NA19098.r1 NA19098.r1.A04
      ## NA19098.r1.A05     NA19098          r1         A05  NA19098.r1 NA19098.r1.A05
      ## ...                    ...         ...         ...         ...            ...
      ## NA19239.r3.H08     NA19239          r3         H08  NA19239.r3 NA19239.r3.H08
      ## NA19239.r3.H09     NA19239          r3         H09  NA19239.r3 NA19239.r3.H09
      ## NA19239.r3.H10     NA19239          r3         H10  NA19239.r3 NA19239.r3.H10
      ## NA19239.r3.H11     NA19239          r3         H11  NA19239.r3 NA19239.r3.H11
      ## NA19239.r3.H12     NA19239          r3         H12  NA19239.r3 NA19239.r3.H12
      ##                mean_counts
      ##                  <numeric>
      ## NA19098.r1.A01     3.32801
      ## NA19098.r1.A02     3.36238
      ## NA19098.r1.A03     2.29306
      ## NA19098.r1.A04     2.83397
      ## NA19098.r1.A05     3.72560
      ## ...                    ...
      ## NA19239.r3.H08    3.097861
      ## NA19239.r3.H09    2.933568
      ## NA19239.r3.H10    0.316813
      ## NA19239.r3.H11    2.678615
      ## NA19239.r3.H12    4.068482

      There are several functions that can be used to calculate summary metrics – such as mean, median, variance, etc. – across rows or columns of a matrix (or a sparse matrix).More

      Exercise 2

      1. Add a new column to colData named “total_counts” with the sum of counts in each cell.
      2. Create a new assay called “cpm” (Counts-Per-Million), which contains the result of dividing the counts matrix by the total counts in millions.
      3. How can you access this new assay?

      Answer

      5.2.4 Subsetting SCE Objects

      Similarly to the standard data.frame and matrix objects in R, we can use the [ operator to subset our SingleCellExperiment either by rows (genes) or columns (cells). The general syntax is: sce[rows_of_interest, columns_of_interest].

      For example:

      # subset by numeric index
      tung[1:3, ] # the first 3 genes, keep all cells
      tung[, 1:3] # the first 3 cells, keep all genes
      tung[1:3, 1:2] # the first 3 genes and first 2 cells
      
      # subset by name
      tung[c("ENSG00000069712", "ENSG00000237763"), ]
      tung[, c("NA19098.r1.A01", "NA19098.r1.A03")]
      tung[c("ENSG00000069712", "ENSG00000237763"), c("NA19098.r1.A01", "NA19098.r1.A03")]

      Although manually subsetting the object can sometimes be useful, more often we want to do conditional subsetting based on TRUE/FALSE logical statements. This is extremely useful for filtering our data. Let see some practical examples.

      Let’s say we wanted to retain genes with a mean count greater than 0.01. As we saw, to calculate the mean counts per gene (rows in the SCE object), we can use the rowMeans() function:

      # calculate the mean counts per gene
      gene_means <- rowMeans(counts(tung))
      
      # print the first 10 values
      gene_means[1:10]
      ## ENSG00000237683 ENSG00000187634 ENSG00000188976 ENSG00000187961 ENSG00000187583 
      ##      0.28240741      0.03009259      2.63888889      0.23842593      0.01157407 
      ## ENSG00000187642 ENSG00000188290 ENSG00000187608 ENSG00000188157 ENSG00000237330 
      ##      0.01273148      0.02430556      1.89583333      3.32523148      0.02199074

      We can turn this into a TRUE/FALSE vector by using a logical operator:

      gene_means[1:10] > 0.01
      ## ENSG00000237683 ENSG00000187634 ENSG00000188976 ENSG00000187961 ENSG00000187583 
      ##            TRUE            TRUE            TRUE            TRUE            TRUE 
      ## ENSG00000187642 ENSG00000188290 ENSG00000187608 ENSG00000188157 ENSG00000237330 
      ##            TRUE            TRUE            TRUE            TRUE            TRUE

      We can use such a logical vector inside [ to filter our data, which will return only the cases where the value is TRUE:

      tung[gene_means > 0.01, ]
      ## class: SingleCellExperiment 
      ## dim: 16052 864 
      ## metadata(0):
      ## assays(3): counts logcounts cpm
      ## rownames(16052): ENSG00000237683 ENSG00000187634 ... ERCC-00170
      ##   ERCC-00171
      ## rowData names(0):
      ## colnames(864): NA19098.r1.A01 NA19098.r1.A02 ... NA19239.r3.H11
      ##   NA19239.r3.H12
      ## colData names(7): individual replicate ... mean_counts total_counts
      ## reducedDimNames(0):
      ## mainExpName: NULL
      ## altExpNames(0):

      Notice how the resulting SCE object has fewer genes than the original.

      Another common use case is to retain cells with a certain number of genes above a certain threshold of expression. For this question, we need to break the problem into parts. First let’s check in our counts matrix, which genes are expressed above a certain threshold:

      # counts of at least 1
      counts(tung) > 0
      ##                 NA19098.r1.A01 NA19098.r1.A02 NA19098.r1.A03 NA19098.r1.A04
      ## ENSG00000237683          FALSE          FALSE          FALSE           TRUE
      ## ENSG00000187634          FALSE          FALSE          FALSE          FALSE
      ## ENSG00000188976           TRUE           TRUE           TRUE           TRUE
      ## ENSG00000187961          FALSE          FALSE          FALSE          FALSE
      ## ENSG00000187583          FALSE          FALSE          FALSE          FALSE
      ## ENSG00000187642          FALSE          FALSE          FALSE          FALSE
      ## ENSG00000188290          FALSE          FALSE          FALSE          FALSE
      ## ENSG00000187608           TRUE           TRUE          FALSE           TRUE
      ## ENSG00000188157           TRUE           TRUE           TRUE           TRUE
      ## ENSG00000237330          FALSE          FALSE          FALSE          FALSE

      We can see that our matrix is now composed of only TRUE/FALSE values. Because TRUE/FALSE are encoded as 1/0, we can use colSums() to calculate the total number of genes above this threshold per cell:

      # total number of detected genes per cell
      total_detected_per_cell <- colSums(counts(tung) > 0)
      
      # print the first 10 values
      total_detected_per_cell[1:10]
      ## NA19098.r1.A01 NA19098.r1.A02 NA19098.r1.A03 NA19098.r1.A04 NA19098.r1.A05 
      ##           8368           8234           7289           7985           8619 
      ## NA19098.r1.A06 NA19098.r1.A07 NA19098.r1.A08 NA19098.r1.A09 NA19098.r1.A10 
      ##           8659           8054           9429           8243           9407

      Finally, we can use this vector to apply our final condition, for example that we want cells with at least 5000 detected genes:

      tung[, total_detected_per_cell > 5000]
      ## class: SingleCellExperiment 
      ## dim: 19027 849 
      ## metadata(0):
      ## assays(3): counts logcounts cpm
      ## rownames(19027): ENSG00000237683 ENSG00000187634 ... ERCC-00170
      ##   ERCC-00171
      ## rowData names(0):
      ## colnames(849): NA19098.r1.A01 NA19098.r1.A02 ... NA19239.r3.H11
      ##   NA19239.r3.H12
      ## colData names(7): individual replicate ... mean_counts total_counts
      ## reducedDimNames(0):
      ## mainExpName: NULL
      ## altExpNames(0):

      Notice how the new SCE object has fewer cells than the original.

      Here is a summary of the syntax used for some common filters:

      Filter onCodeDescription
      CellscolSums(counts(sce)) > xTotal counts per cell greater than x.
      CellscolSums(counts(sce) > x) > yCells with at least y genes having counts greater than x.
      GenesrowSums(counts(sce)) > xTotal counts per gene greater than x.
      GenesrowSums(counts(sce) > x) > yGenes with at least y cells having counts greater than x.

      Exercise 3

      1. Create a new object called tung_filtered which contains:
        • cells with at least 25000 total counts
        • genes that have more than 5 counts in at least half of the cells
      2. How many cells and genes are you left with?

      Answer

      5.3 Visual Data Exploration

      There are several ways to produce visualisations from our data. We will use the ggplot2 package for our plots, together with the Bioconductor package scater, which has some helper functions for retrieving data from a SingleCellExperiment object ready for visualisation.

      As a reminder, the basic components of a ggplot are:

      • A data.frame with data to be plotted
      • The variables (columns of the data.frame) that will be mapped to different aesthetics of the graph (e.g. axis, colours, shapes, etc.)
      • the geometry that will be drawn on the graph (e.g. points, lines, boxplots, violinplots, etc.)

      This translates into the following basic syntax:

      ggplot(data = <data.frame>, 
             mapping = aes(x = <column of data.frame>, y = <column of data.frame>)) +
         geom_<type of geometry>()

      For example, let’s visualise what the distribution of total counts per cell is for each of our batches. This information is stored in colData, so we need to extract it from our object and convert it to a standard data.frame:

      cell_info <- as.data.frame(colData(tung))
      
      head(cell_info)
      ##                individual replicate well      batch      sample_id mean_counts
      ## NA19098.r1.A01    NA19098        r1  A01 NA19098.r1 NA19098.r1.A01    3.328008
      ## NA19098.r1.A02    NA19098        r1  A02 NA19098.r1 NA19098.r1.A02    3.362380
      ## NA19098.r1.A03    NA19098        r1  A03 NA19098.r1 NA19098.r1.A03    2.293057
      ## NA19098.r1.A04    NA19098        r1  A04 NA19098.r1 NA19098.r1.A04    2.833973
      ## NA19098.r1.A05    NA19098        r1  A05 NA19098.r1 NA19098.r1.A05    3.725600
      ## NA19098.r1.A06    NA19098        r1  A06 NA19098.r1 NA19098.r1.A06    3.576549
      ##                total_counts
      ## NA19098.r1.A01        63322
      ## NA19098.r1.A02        63976
      ## NA19098.r1.A03        43630
      ## NA19098.r1.A04        53922
      ## NA19098.r1.A05        70887
      ## NA19098.r1.A06        68051

      Now we are ready to make our plot:

      # load the library
      library(ggplot2)
      
      ggplot(data = cell_info, aes(x = batch, y = total_counts)) +
        geom_violin(fill = 'brown') + theme_bw() + 
        theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))

      What if we wanted to visualise the distribution of expression of a particular gene in each batch? This now gets a little more complicated, because the gene expression information is stored in the counts assay of our SCE, whereas the batch information is in the colData. To bring both of these pieces of information together would require us to do a fair amount of data manipulation to put it all together into a single data.frame. This is where the scater package is very helpful, as it provides us with the ggcells() function that let’s us specify all these pieces of information for our plot.

      For example, the same plot as above could have been done directly from our tung SCE object:

      library(scater)
      ## Loading required package: scuttle
      ggcells(tung, aes(x = batch, y = total_counts)) + 
        geom_violin(fill = 'orange') + theme_bw() + 
        theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))

      If we instead wanted to plot the expression for one of our genes, we could do it as:

      library(scater)
      ggcells(tung, aes(x = batch, y = ENSG00000198938), exprs_values = "logcounts") + 
        geom_violin(fill = 'coral2') + theme_bw() + 
        theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))

      Note that we specified which assay we wanted to use for our expression values (exprs_values option). The default is “logcounts,” so we wouldn’t have had to specify it in this case, but it’s worth knowing that in case you want to visualise the expression from a different assay. The functionality provided by the scater package goes far beyond plotting, it also includes several functions for quality control, which we will return to in the next chapter.

      Exercise 4

      Make a scatterplot showing the relationship between the mean and the variance of the raw counts per cell. (Bonus: also colour the cells by batch.)Hint

      What can you conclude from this data exploration, in terms of selecting highly variable genes for downstream analysis?Answer

      5.4 Overview

      KEY POINTS

      • The SingleCellExperiment (SCE) object is used to store expression data as well as information about our cells (columns) and genes (rows).
      • To create a new SCE object we can use the SingleCellExperiment() function. To read the output from cellranger we can use the dedicated function DropletUtils::read10xCounts().
      • The main parts of this object are:
        • assay – one or more matrices of expression quantification.
          • There is one essential assay named “counts,” which contains the raw counts on which all other analyses are based on.
        • rowData – information about our genes.
        • colData – information about our cells.
        • reducedDim – one or more reduced dimensionality representations of our data.
      • We can access all the parts of this object using functions of the same name. For example assay(sce, "counts") retrieves the counts matrix from the object.
      • We can add/modify parts of this object using the assignment operator <-. For example assay(sce, "logcounts") <- log2(counts(sce) + 1) would add a new assay named “logcounts” to our object.
      • Matrix summary metrics are very useful to explore the properties of our expression data. Some of the more useful functions include rowSums()/colSums() and rowMeans()/colMeans(). These can be used to summarise information from our assays, for example colSums(counts(sce)) would give us the total counts in each cell (columns).
      • Combining matrix summaries with conditional operators (>, <, ==, !=) can be used for conditional subsetting using [.
      • We can use the ggcells() function (from the scater package), to produce ggplot-style plots directly from our SCE object.

      5.4.1 sessionInfo()

      View session info

      ## R version 4.1.0 (2021-05-18)
      ## Platform: x86_64-pc-linux-gnu (64-bit)
      ## Running under: Ubuntu 20.04.2 LTS
      ## 
      ## Matrix products: default
      ## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
      ## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
      ## 
      ## locale:
      ##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
      ##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
      ##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
      ##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
      ##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
      ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
      ## 
      ## attached base packages:
      ## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
      ## [8] methods   base     
      ## 
      ## other attached packages:
      ##  [1] scater_1.20.1               scuttle_1.2.0              
      ##  [3] ggplot2_3.3.5               SingleCellExperiment_1.14.1
      ##  [5] SummarizedExperiment_1.22.0 Biobase_2.52.0             
      ##  [7] GenomicRanges_1.44.0        GenomeInfoDb_1.28.1        
      ##  [9] IRanges_2.26.0              S4Vectors_0.30.0           
      ## [11] BiocGenerics_0.38.0         MatrixGenerics_1.4.0       
      ## [13] matrixStats_0.59.0          knitr_1.33                 
      ## 
      ## loaded via a namespace (and not attached):
      ##  [1] viridis_0.6.1             sass_0.4.0               
      ##  [3] BiocSingular_1.8.1        jsonlite_1.7.2           
      ##  [5] viridisLite_0.4.0         DelayedMatrixStats_1.14.0
      ##  [7] bslib_0.2.5.1             assertthat_0.2.1         
      ##  [9] highr_0.9                 GenomeInfoDbData_1.2.6   
      ## [11] vipor_0.4.5               yaml_2.2.1               
      ## [13] pillar_1.6.1              lattice_0.20-44          
      ## [15] glue_1.4.2                beachmat_2.8.0           
      ## [17] digest_0.6.27             XVector_0.32.0           
      ## [19] colorspace_2.0-2          htmltools_0.5.1.1        
      ## [21] Matrix_1.3-4              pkgconfig_2.0.3          
      ## [23] bookdown_0.22             zlibbioc_1.38.0          
      ## [25] purrr_0.3.4               scales_1.1.1             
      ## [27] ScaledMatrix_1.0.0        BiocParallel_1.26.1      
      ## [29] tibble_3.1.2              generics_0.1.0           
      ## [31] farver_2.1.0              ellipsis_0.3.2           
      ## [33] withr_2.4.2               magrittr_2.0.1           
      ## [35] crayon_1.4.1              evaluate_0.14            
      ## [37] fansi_0.5.0               beeswarm_0.4.0           
      ## [39] tools_4.1.0               lifecycle_1.0.0          
      ## [41] stringr_1.4.0             munsell_0.5.0            
      ## [43] DelayedArray_0.18.0       irlba_2.3.3              
      ## [45] compiler_4.1.0            jquerylib_0.1.4          
      ## [47] rsvd_1.0.5                rlang_0.4.11             
      ## [49] grid_4.1.0                RCurl_1.98-1.3           
      ## [51] BiocNeighbors_1.10.0      bitops_1.0-7             
      ## [53] labeling_0.4.2            rmarkdown_2.9            
      ## [55] gtable_0.3.0              codetools_0.2-18         
      ## [57] DBI_1.1.1                 R6_2.5.0                 
      ## [59] gridExtra_2.3             dplyr_1.0.7              
      ## [61] utf8_1.2.1                stringi_1.7.3            
      ## [63] ggbeeswarm_0.6.0          Rcpp_1.0.7               
      ## [65] vctrs_0.3.8               tidyselect_1.1.1         
      ## [67] xfun_0.24                 sparseMatrixStats_1.4.0

      专题:单细胞RNA-seq测序数据分析:

      • 1 About the course 关于单细胞测序跟练课程
      • 2 单细胞RNA-seq介绍
      • 3 Processing Raw scRNA-Seq Sequencing Data: From Reads to a Count Matrix处理scRNA-seq测序的原始数据:把读取的数据转化为计数矩阵
      • 5 scRNA-seq Analysis with Bioconductor
      • 6 Basic Quality Control (QC) and Exploration of scRNA-seq Datasets
      • 7 Biological Analysis
      • 8 Single cell RNA-seq analysis using Seurat
      • 9 scRNA-seq Dataset Integration
      • 10 Resources
      • 11 References
      • 单细胞RNA-seq测序分析-跟练
      • 谈谈单细胞测序那些事儿
      • 【单细胞技术贴】空间转录组与单细胞转录组的整合分析(上篇)
      • 【单细胞技术贴】空间转录组与单细胞转录组的整合分析(下篇)
      • 【单细胞数据分析】SCENIC 从单细胞数据推断基因调控网络和细胞类型

      请关注“恒诺新知”微信公众号,感谢“R语言“,”数据那些事儿“,”老俊俊的生信笔记“,”冷🈚️思“,“珞珈R”,“生信星球”的支持!

      • 分享:
      作者头像
      xu, xintian

      上一篇文章

      3 Processing Raw scRNA-Seq Sequencing Data: From Reads to a Count Matrix处理scRNA-seq测序的原始数据:把读取的数据转化为计数矩阵
      2021年9月9日

      下一篇文章

      6 Basic Quality Control (QC) and Exploration of scRNA-seq Datasets
      2021年9月9日

      你可能也喜欢

      3-1675088138
      Nature | 单细胞技术揭示衰老细胞与肌肉再生
      30 1月, 2023
      2-1675088548
      lncRNA和miRNA生信分析系列讲座免费视频课和课件资源包,干货满满
      30 1月, 2023
      9-1675131201
      如何快速批量修改 Git 提交记录中的用户信息
      26 1月, 2023

      留言 取消回复

      要发表评论,您必须先登录。

      搜索

      分类

      • R语言
      • TCGA数据挖掘
      • 单细胞RNA-seq测序
      • 在线会议直播预告与回放
      • 数据分析那些事儿分类
      • 未分类
      • 生信星球
      • 老俊俊的生信笔记

      投稿培训

      免费

      alphafold2培训

      免费

      群晖配置培训

      免费

      最新博文

      Nature | 单细胞技术揭示衰老细胞与肌肉再生
      301月2023
      lncRNA和miRNA生信分析系列讲座免费视频课和课件资源包,干货满满
      301月2023
      如何快速批量修改 Git 提交记录中的用户信息
      261月2023
      logo-eduma-the-best-lms-wordpress-theme

      (00) 123 456 789

      weinfoadmin@weinformatics.cn

      恒诺新知

      • 关于我们
      • 博客
      • 联系
      • 成为一名讲师

      链接

      • 课程
      • 事件
      • 展示
      • 问答

      支持

      • 文档
      • 论坛
      • 语言包
      • 发行状态

      推荐

      • iHub汉语代码托管
      • iLAB耗材管理
      • WooCommerce
      • 丁香园论坛

      weinformatics 即 恒诺新知。ICP备案号:粤ICP备19129767号

      • 关于我们
      • 博客
      • 联系
      • 成为一名讲师

      要成为一名讲师吗?

      加入数以千计的演讲者获得100%课时费!

      现在开始

      用你的站点账户登录

      忘记密码?

      还不是会员? 现在注册

      注册新帐户

      已经拥有注册账户? 现在登录

      close
      会员购买 你还没有登录,请先登录
      • ¥99 VIP-1个月
      • ¥199 VIP-半年
      • ¥299 VIP-1年
      在线支付 激活码

      立即支付
      支付宝
      微信支付
      请使用 支付宝 或 微信 扫码支付
      登录
      注册|忘记密码?