• 主页
  • 课程

    关于课程

    • 课程归档
    • 成为一名讲师
    • 讲师信息
    同等学历教学

    同等学历教学

    免费
    阅读更多
  • 特色
    • 展示
    • 关于我们
    • 问答
  • 事件
  • 个性化
  • 博客
  • 联系
  • 站点资源
    有任何问题吗?
    (00) 123 456 789
    weinfoadmin@weinformatics.cn
    注册登录
    恒诺新知
    • 主页
    • 课程

      关于课程

      • 课程归档
      • 成为一名讲师
      • 讲师信息
      同等学历教学

      同等学历教学

      免费
      阅读更多
    • 特色
      • 展示
      • 关于我们
      • 问答
    • 事件
    • 个性化
    • 博客
    • 联系
    • 站点资源

      单细胞RNA-seq测序

      • 首页
      • 博客
      • 单细胞RNA-seq测序
      • 2 单细胞RNA-seq介绍

      2 单细胞RNA-seq介绍

      • 发布者 xu, xintian
      • 分类 单细胞RNA-seq测序, 未分类
      • 日期 2021年9月9日
      • 评论 0评论

      专题介绍:单细胞RNA-seq被评为2018年重大科研进展,但实际上这是老技术。2015年,商品化单细胞RNA测序流程已经建立,成果发表在Cell上。今年井喷式发文章,关注点那么高,是因为最近这项技术全面商品化了。

      QUESTIONS问题

      • 什么是单细胞测序以及它与bulk RNA-seq的不同?
      • scRNA-seq的典型应用是什么?
      • scRNA-seq的样本材料通常是怎么准备的?
      • 最有名的几种方法的区别是什么?它们的又是和劣势又分别是什么?
      • 要利用scRNA-seq的话,需要怎在实验设计中怎么选择(合适的策略和方法)?
      • 和bulk数据相比,单细胞RNA-seqq会面临哪些挑战?

      2.1 单细胞RNA-seq测序概述

      RNA-seq可以通过(细胞)样品向人们展示转录情况,既高效又划算。它是00年后半年的一项伟大的突破,并从此逐渐流行,大量代替了其它的转录组分析研究方式,比如microarrays。它的成功很大程度上是由于在单细胞样本中能提取所有的转录组使结果十分客观,而不是像在用。通常来说,RNA测序的样本是由细胞的混合物组成的,这样的测序方法叫作bulk RNA-seq,有着广泛的应用。比方说,它可以展现样品组织不同状态下基因表达的显著特征,(分辨出)正常或病理状态、野生型或突变型、对照组或实验组。还可以通过对比不同物种的组织样本转录情况进行一些进化的相关研究。它除了有转录测序的功能,还能用来发现和标注新的基因、异构体等,无论这是个生物体还是只是个模型。

      然而,我们只能利用bulk RNA-seq从一大群细胞的情况推测每个基因表达的平均水平,却忽视了在样本中可能掺杂了一些单个细胞杂质,进而影响对检测结果准确性的判断。因此,研究这种“非均匀系”(细胞混合物)是远远不够的。例如:复杂组织(例如脑组织)的早期研究进展

      单细胞测序及多细胞测序的大致对比

      为了解决这种局限性,新方案出现了,那就是运用单细胞水平的RNA测序技术(sxRNA-seq),它在2019年首次亮相。2014年的时候,这种新技术更便宜了也更容易得到了,就火了起来。不像其它的多细胞测序方法,我们可以通过单个基因的表达来推测一整群细胞的表达水平。这就让我们能够解决一些新的生物学问题,因为在转录过程中单个细胞的变化是很重要的。比如,发现一种新的或是稀有的细胞类型时,我们需要辨别它们在生理和病理学上的成分差异,以及细胞生长过程中的种种变化。这项技术的应用是基因图谱的构建(见下文框内),它可以综合且概要地展现出机体内细胞的多样性,在医疗卫生与基础研究方面有着广泛的应用。

      Single-cell atlases单细胞图谱

      有许多项目正在尝试给生命体中的细胞做一个广泛又全面的综合目录,以下是几个不全面的例子:

      • Human Cell Atlas (H. sapiens)
      • Tabula Muris (M. musculus)
      • Fly Cell Atlas (D. melanogaster)
      • Cell Atlas of Worm (C. elegans)
      • Arabidopsis Root Atlas (A. thaliana)

      Figure 2.1: Moore’s law in single cell transcriptomics, showing an increase in the throughput of experiments from tens to millions of cells in just over a decade. (image taken from Svensson et al.)

      数据图(?)2.1:Moore’s law(?)单细胞转录组 实验的产出在10年间呈现出几十个到几百万个细胞数量的增长。(图片来源蓝字链接)

      2.2 Sample Preparation Protocols样品准备方案

      Broadly speaking, a typical scRNA-seq protocol consists of the following steps (illustrated in the figure below):

      总的来说,一个典型的acRNA-seq实验包括以下几个步骤:

      • Tissue dissection and cell dissociating to obtain a suspension of cells.
      • Optionally cells may be selected (e.g. based on membrane markers, fluorescent transgenes or staining dyes).
      • Capture single cells into individual reaction containers (e.g. wells or oil droplets).
      • Extracting the RNA from each cell.
      • Reverse-transcribing the RNA to more stable cDNA.
      • Amplifying the cDNA (either by in vitro transcription or by PCR).
      • Preparing the sequencing library with adequate molecular adapters.
      • Sequencing, usually with paired-end Illumina protocols.
      • Processing the raw data to obtain a count matrix of genes-by-cells
      • Carrying several downstream analysis (the focus of this course).
      • 将组织切块、细胞分离以制备细胞悬浮液。
      • 随机选取细胞(比如采用根据细胞膜标记法、荧光转基因标记法或染色法)。
      • 将单个细胞捕获并置入反应空间(比如孔里或是油滴中)。
      • 从每个细胞中提取RNA。
      • 把RNA反转录成更稳定的cDNA。
      • 扩增cDNA(体外转录或者CPR都行)。
      • 从基因文库中准备好足量适配的对应物。
      • 测序,常用Illumina公司的双端(?)方案。
      • 对原始数据进行处理,得到按细胞计数的基因表达矩阵。
      • 进行一些接下来的分析(此课程的重点)。

      This course deals mostly with the last step of this workflow, but it is important to consider some of the steps that come before that, as they have an impact on the properties of the data we get.

      这个课程主要讲的是以上流程的最后一步,但是考虑之前的步骤也很重要,因为这影响到我们得到的数据的属性特点。

      Schematic of a typical single-cell sequencing workflow. Abbreviations: IVT, in vitro transcription; PCR, polymerase chain reaction; UMI, unique molecular identifier. Image from Lafzi et al. 2018.

      以上是一个典型的单细胞测序流程图。相关简写:IVT对应in vitro transcription(体外转录); PCR对应polymerase chain reaction(多聚酶链式反应); UMI对应unique molecular identifier(独有的分子标记). 图源 Lafzi et al. 2018 。

      Single-nucleus RNA-seq单核RNA测序

      In tissues where cell dissociation is difficult or in frozen tissue samples, instead of isolating whole single cells it is possible to instead isolate single nuclei. Apart from the isolation step, the protocol to prepare single-nuclei sequencing libraries is similar to that of single-cell protocols. However, nuclear RNA usually contains a higher proportion of unprocessed RNA, with more of the sequenced transcripts containing introns. This aspect needs to be considered in the data processing steps, which we detail in the following chapter.

      在细胞难以分离或是冰冻的组织样品中,与其把细胞单独地分离开来不如把细胞核提取出来。除了分离的那一步,在单核测序文库中准备好材料的那一步也差不多和在单细胞测序差不多。然而,核内RNA常常有更高比例还未加工的RNA,有更多转录组序列含有内含子,这在数据处理的步骤需要好好考虑,后面的章节会提到。

      There are currently a wide diversity of protocols for preparing scRNA-seq data, each with its own strengths and weaknesses, which we will come to below. These methods can be categorized in different ways, but the two most important aspects are cell capture or isolation and transcript quantification.

      目前有许多用于准备scRNA-seq数据材料的方法,每一种都有自己的优势和劣势,都给列在下面了。这些方法可以给分类看,但最重要的两方面是看它如何捕获单个细胞以及转录的量。

      Comparison of common scRNA-seq protocols. Abbreviations: cDNA, complementary DNA; DNA pol I, DNA polymerase I; FACS, fluorescence-activated cell sorting; PCR, polymerase chain reaction; RNase H, ribonuclease H; RT, reverse transcription; TSO, template-switching oligonucleotide. (source: Chen, Teichman and Meyer, 2018)

      普通方法的对比图。简写说明:cDNA对应 complementary DNA (互补DNA); DNA pol I对应DNA polymerase I(DNA聚合酶I); FACS对应fluorescence-activated cell sorting(荧光激活细胞分选术;流式细胞术); PCR对应polymerase chain reaction(多聚酶链式反应); RNase H对应ribonuclease H(核糖核酸酶H); RT对应reverse transcription(反转录); TSO对应template-switching oligonucleotide(模板调节寡核苷酸?).

      2.3 Cell Capture捕捉细胞

      The strategy used for capturing cells determines the throughput of the experiment (i.e. how many cells we isolate), how the cells are selected prior to sequencing, as well as what kind of additional information besides transcript sequencing can be obtained. The three most widely used options are microtitre-plate-based, microfluidic-array-based and microfluidic-droplet-based methods.

      捕捉细胞的策略方式决定了实验的产出量(即分理出了多少细胞),在测序前怎么筛选细胞,以及除了转录测序之外还能获取什么信息。三种最常用的选择是微滴定板、微流体阵列及微流体液滴法(microtitre-plate-based, microfluidic-array-based and microfluidic-droplet-based methods )。

      Single cell isolation methods.

      Figure 2.2: Single cell isolation methods.图2.2:单细胞分离方法

      Microtitre-plate methods rely on isolating cells into individual wells of the plate using, for example, pipetting, microdissection or fluorescent activated cell sorting (FACS). One advantage of well-based methods is that one can take pictures of the cells before library preparation, providing an additional data modality. For example one can identify and discard damaged cells or find wells containing doublets (wells with two or more cells). When using automatic FACS sorting, it is also possible to associate information such as cell size and the intensity of any used labels with the well coordinates, and therefore with individual cell indices in downstream analysis. The main drawback of these methods is that they are often low-throughput and the amount of work required per cell may be considerable.

      微型滴定板法靠板上的一个个孔分离细胞,再用上比如移液、显微解剖或荧光活化细胞分选之类的技术。它的好处之一是在文库准备之前它就可以拍照,提供了另外的数据形式。比如人们可以辨别和去除损坏的细胞,或者把一个孔里有两个细胞的情况挑出来。当采用自动流式细胞术时,它可以在孔的调节下还可以将细胞大小、示踪显示的密度等信息相关联,然后与接下来的分析中细胞指数相关联(?)。这种方法的主要缺点时它们通常比较低产 ,在每个细胞上也要花很大的工夫。

      Microfluidic-array platforms, such as Fluidigm’s C1, provide a more integrated system for capturing cells and for carrying out the reactions necessary for the library preparations. Thus, they provide a higher throughput than microtitre-plate-based methods. Typically, only around 10% of cells are captured in a microfluidic platform and thus they are not appropriate if one is dealing with rare cell-types or very small amounts of input. Care also has to be taken with the cell sizes captured by the arrays, as the nanowells are customised for particular sizes (this may therefore affect the unbiased sampling of cells in complex tissues). Moreover, the chip is relatively expensive, but since reactions can be carried out in a smaller volume, money can be saved on reagents.

      微流体阵列平台,比如Fluidigm’s C1,为捕获细胞进行文库制备所需要的反映提供了一个更综合的体系,所以它比微型滴定板更加高产。通常来说用流体微阵列平台只有大约只能捕获10%的细胞,所以对种类稀有或者量少的细胞来说这两种方法都不太合适。还需要关注阵列捕获的细胞大小,因为纳米孔是按照细胞的大小定制的(这就可能影响到复杂组织中的无偏差采样)。另外这些被分离开来的东西是很贵的,但既然这个实验只要少量的样品就够了,就可以节约试剂的费用。

      Microfluidic-droplet methods offer the highest throughput and are the most popular method used nowadays. They work by encapsulating individual cells inside a nanoliter-sized oil droplet, together with a bead. The bead is loaded with enzymes and other components required to construct the library. In particular, each bead contains a unique barcode which is attached to all of the sequencing reads originating from that cell. Thus, all of the droplets can be pooled, sequenced together and the reads can subsequently be assigned to the cell of origin based on those barcodes. Droplet platforms have relatively cheap library preparation costs on the order of 0.05 USD/cell. Instead, sequencing costs often become the limiting factor and a typical experiment the coverage is low with only a few thousand different transcripts detected (Ziegenhain et al. 2017).

      微流体液滴法有着最高的生产量,也是当今最流行的一种方法。它的工作院里是将单个细胞封在一纳米升大小的油滴中,和一个珠子在一起。珠子里装着酶和其它需要参与构建文库的成分,尤其包含一个独特的条码,这条码附着在那个细胞里测序读取出来的结果上。因此,所有的油滴都聚集在一起,一起被测序、读取信息,然后按照原始编排顺序排好形成了原来细胞里的次序。这个方法的文库准备成本较低,每个细胞大概只有0.05美元。然而,测序成本往往成为限制因素,作为一个典型的实验覆盖率却很低,能查到的只有几千个不同的转录本。

      Fluorescence Activated Cell Sorting (FACS) can be used upstream of any of the capture methods, to select a sub-population of cells. A common way in which this is used is to stain the cells with a dye that distinguishes between live and dead cells (e.g. due to membrane rupture), thus enriching the cell suspension with viable cells.

      荧光活化细胞分选法(FACS)可以用来完成任何捕获方法的前期步骤去精选细胞群。常见的方法是用一种染料将细胞染色以区分活细胞和死细胞(比如由于细胞膜破裂造成的死亡),进而用能生长的细胞来制作悬浮液。

      2.4 Transcript Quantification量化转录组

      There are two types of transcript quantification: full-length and tag-based. Full-length protocols try to achieve a uniform read coverage across the whole transcript, whereas tag-based protocols only capture either the 5’ or 3’ ends. The choice of quantification method has important implications for what types of analyses the data can be used for.

      量化转录组有两种基本方式:整条RNA一起弄和把弄碎了之后的RNA提前加上标签。如果整条一起弄,就是在尝试在转录的时候尝试得到一个整体的转录组,而用到标签的方法只捕获3‘端和5’端。量化方式的选择对得到的数据结果能做怎样的分析有着非常重要的影响。

      Preparing full-length libraries for single-cell is essentially identical to what is done in bulk RNA-seq (Figure below), and is restricted to plate-based protocols such as SMART-seq2. Although in theory full-length protocols should provide an even coverage of transcripts, there can sometimes be biases in the coverage across the gene body (illustrated below). Full-length protocols also allow the detection of splice variants, which is very difficult to do with other protocols.

      给单个细胞准备完整长度文库的操作和bulk RNA-seq一样(见下图),并且仅限于用滴定板的方法做,比如SMART-seq2。尽管理论上完整的长度在转录中覆盖的比较均匀,他们有时候覆盖整体基因的时候可能会存在一些偏差(如下所示)。这种方法也可以探测拼接起来的变种,用其他方法是很难做到的。

      Full-length RNA library preparation for Illumina sequencing. Samples are enriched for RNA containing a poly(A) tail, which avoids sequencing rRNA (at the cost of also missing other non-coding RNAs). The RNA is then fragmented and reverse-transcribed to more stable cDNA, Illumina adapters ligated to each molecule and finally PCR-amplified. In the case of single-cell RNA-seq, adapters with well-specific barcodes are used, allowing to identify sequencing reads belonging to individual cells. (source)

      用完整长度的RNA进行准备Illumina 的测序。样品中充满了含一个polyA tail的RNA,可以省去rRNA测序(但要以失去其它非编码RNA为代价)。然后RNA被碎片化,进行反转录,形成更稳定的cDNA,Illumina的配体和每个分子结合最后PCR扩增。在这种单细胞测序的情况下,有着更加精确的适配条码,使得可以辨别出测序的东西属于哪个细胞。

      Example of 3' bias in the gene body coverage, after aligning the sequencing reads to the transcriptome. Each line represents the average coverage across all the genes in a cell. In this example, in addition to the 3' bias across all cells, there are three cells that look like outliers relative to the rest and should be removed from downstream analysis. These may be cells where RNA quality was poorer, e.g. due to degradation.

      Figure 2.3: Example of 3’ bias in the gene body coverage, after aligning the sequencing reads to the transcriptome. Each line represents the average coverage across all the genes in a cell. In this example, in addition to the 3’ bias across all cells, there are three cells that look like outliers relative to the rest and should be removed from downstream analysis. These may be cells where RNA quality was poorer, e.g. due to degradation.

      图2.3测序组与转录组对其后,基因里3‘端产生偏差的例子。每条线代表一个细胞里所有基因的平均覆盖率。着这个例子里,除了所有细胞都存在的3’端的偏差,还有三个细胞与众不同,是不合理的,数据就要在进一步的分析中被去除。这可能是RNA质量较差的细胞,比如他们可能在被降解。

      With tag-based protocols, only one of the ends (3’ or 5’) of the transcript is sequenced. The main advantage of tag-based protocols is that they can be combined with unique molecular identifiers (UMIs), which can help improve the accuracy of transcript quantification. The reason for this improvement has to do with the PCR amplification step during library preparation, which creates several duplicate copies of each molecule. Because this amplification is exponential, molecules may be unfairly represented in the final library, leading to over-estimation of their expression due to these PCR duplicates. To address this problem, cell barcodes are uniquely tagged with a random nucleotide sequence, the UMI, which is therefore unique to a single molecule. This UMI is part of the sequencing read and can then be computationally taken into account when quantifying the transcript’s abundance. Most current scRNA-seq protocols are tag-based, including the popular droplet-based 10x Chromium protocol, illustrated in the figure below. One disadvantage of tag-based protocols is that, being restricted to one end of the transcript only, it reduces our ability to unambiguously align reads to a transcript, as well as making it difficult to distinguish different isoforms (Archer et al. 2016).

      有了加标签这一步骤,只有一段的转录组被测序(3‘端或5’端)。它的主要好处是通过特有的分子标记来提高转录质量的准确性。这种改进的原因和PCR扩增的步骤基因文库的制备有关,让每个分子都有一些副本。因为扩增是指数型扩增,分子可能没有完全呈现在最终的文库中,PCR的副本导致高估了PCR的表达产出。为了解决这个问题,细胞条形码用随机的核苷酸序列做独特的标记,就是UMI,所以对每个分子来说,UMI是唯一的。UMI是测序读取的一部分,可以被电脑读取然后计数当他的量满足转录组的数量。所有目前的scRNA-seq都是要利用标签的,包括非常火的 droplet-based 10x Chromium protocol,展示在下面的图片里。用标签的方法的一个缺点是它可以严格限制在转录组的一端,降低了我们将转录组精准对齐的能力,也使辨别不同的异构体更加困难。

      Protocol overview of 3’ libraries using the 10X Chromium protocol. Cells are captured in individual oil droplets containing a bead (called GEMs). An individual bead contains adapters with a common barcode, but diverse and distinct Unique Molecular Identifier (UMI) sequences. A poly(dT) primer is used to reverse-transcribe mRNA with poly-A tails into cDNA. The GEMs are then broken and the pooled cDNA (from all barcoded cells) is amplified by PCR. Finally, the cDNA is fragmented and another Illumina adapter is ligated at the other end of the molecule. The final library is composed of a read containing the cell-specific barcode (used to identify reads from different cells) and a molecule-specific UMI (used to quantify a gene’s expression), while the second read contains sequence from the actual cDNA molecule and can be used to align it to a reference transcriptome. (source: Chromium Next GEMSingle Cell 3ʹ User Guide)

      3’ libraries using the 10X Chromium protocol 这种方法的概述:细胞被捕获到单个油滴中,包含一个叫作GEMs的珠子。每个珠子都含有一个通用的条形码配体,但它们多种多样,并且有独特的UMI序列。一个poly(dT)引物用来反转录一个含有poly-A tails的mRNA,使之产生一个cDNA。GEMs被破坏,聚集的细胞(被条码标记过)经PCR扩增,最终碎片化的cDNA和另一个Illumina配体在分子的另一端对应。最终的文库由读出来的条码(用来识别不同细胞读出来的东西)和具体的UMI组成 (用来量化基因表达的产物)。第二个读数包括了从cDNA中得到确切的序列,可以用来和转录组嘴比参照。

      5’ or 3’?

      The difference between 5’ and 3’ tag-based protocols is which end of the transcript is sequenced. Although 3’ protocols are more commonly used, many protocols now allow sequencing from either end (e.g. 10x Chromium supports both). The advantage of 5’-end sequencing is that we obtain information about the transcription start site (TSS), which allows to explore whether there is differential TSS usage across cells.

      5‘端和3’端到底从那边开始测序?尽管通常来说用的是3‘端,但是现在许多方案两边都可以(比如 10x Chromium就两者都可)。从5‘端测序的好处之一是我们可以得到转录起始位点的信息,这让我们可以探索TSS在不同细胞中的使用差异。

      2.5 Experimental Design实验设计

      Several considerations need to be taken into account when performing scRNA-seq experiments. Factors such as the cost per cell, how many cells one needs, or how much to sequence each cell, may all influence our choice of protocol. On the other hand, care has to be taken to avoid biases due to batches being processed at different times and a lack of adequate replication may also constrain the types of analysis that can be done and therefore limit our ability to answer some questions of interest.

      当做scRNA-seq实验的时候肯定要考虑一些东西,每个细胞的价格、需要多少细胞、一个细胞需要多少序列,这些影响因素都影响着方法的选择。另一方面,我们也要避免不同实验批次实践的问题,会不会还没有反应完全,这也会限制我们分析的类型,让我们回答不了一些要研究的问题。

      2.5.1 What Protocol Should I Choose?我要选择什么方法?

      The most suitable platform depends on the biological question at hand. For example, if one is interested in characterizing the composition of a heterogeneous tissue, then a droplet-based method is more appropriate, as it allows a very large number of cells to be captured in a mostly unbiased manner. On the other hand, if one is interested in characterizing a specific cell-population for which there is a known surface marker, then it is probably best to enrich using FACS and then sequence a smaller number of cells at higher sequencing depth.

      如何选择最合适的方法取决于你手头想要解决的生物学问题。比如,如果你想鉴定一个复杂组织的成分,那你就用油滴法更合适,因为它能让大量的细胞以均匀的形式被捕获。再比如,如果你对已知表面标记的特定细胞群感兴趣,那你最好用FACS富集细胞,这样就能用更少的细胞数量获得更深的分析效果。

      Clearly, full-length transcript quantification will be more appropriate if one is interested in studying different isoforms, since tagged protocols are much more limited in this regard. By contrast, UMIs can only be used with tagged protocols and they can improve gene-level quantification.

      如果一个人对异构体很感兴趣,那么显然完整长度的转录组量化更为合适,因为标签法在这方面会受到限制。相比较而言,UMI只能在标签法中用,他们可以有效提升基因水平的量化。

      If one is interested in rare cell types (for which known markers are not available), then more cells need to be sequenced, which will increase the cost of the experiment. A useful tool to estimate how many cells to sequence has been developed by the Satija Lab: https://satijalab.org/howmanycells/.

      如果有人对罕见的细胞类型比较感兴趣(标记法不能用的情况下),那么就需要更多的细胞来测序,肯定会增加实验的耗费金额。Satija开发了一个好用的工具来估计实验要用到多少细胞。

      Another way to decide on which method to use, is to rely on studies dedicated to comparing different protocols. These studies focus on issues such as sensitivity (how many genes are detected per cell), their accuracy (e.g. compared to bulk RNA-seq) and in their ability to recover all cell types present in a sample (tested on commercially available cell mixtures). For example, a study by Ding et al. 2020 illustrates how low-throughput methods have higher sensitivity compared to high-throughput methods, such as 10x Chromium (Figure below). On the other hand, low-throughput methods did not capture some of the rarer cell types in their samples, leading to an incomplete characterisation of the cell population.

      考虑用哪种方法的另外一种方式是参考对比这些方法的研究。这先研究关注了敏感性问题(每个细胞能探测到多少基因)、准确性(比如和bulk RNA-seq进行对比),和回复样品中所有细胞种类的能力(用市场上可以买到的细胞混合物做测试)。比如Ding等人2020年的一项研究表明,与高产品量的方法相比,低通量的方法有着更高的敏感性,比如下图所示的10x Chromium。另一方面,低产品量的方法没有捕捉到样品中的一些汉奸的细胞类型,导致在细胞群中没有得到完全的分析和鉴定。

      Transcript detection sensitivity of different methods in a commercial mixture of peripheral blood mononuclear cells (PBMCs). The figure is taken from Ding et al. and shows a) the number of distinct UMIs detected per cell (for methods using tag-based transcript quantification) and b) the number of detected genes per cell across methods. Results from two experimental replicates are shown.

      在可以买到的外周血单核细胞混合物中,不同方法转录组测定的敏感性。数据来源Ding等人,展现出每个细胞独特的UMI标记的数量(用标签法做出来的)。不同方法做出来的实验数量。两个实验结果重复。

      Another study by Ziegenhain et al. (Ziegenhain et al. 2017) compared five different protocols on the same sample of mouse embryonic stem cells (mESCs), reaching similar conclusions. And finally, a study by Svensson et al. (Svensson et al. 2017) used synthetic transcripts (spike-ins) with known concentrations to measure the accuracy and sensitivity of different protocols. Comparing a wide range of studies, they also reported substantial differences between the protocols (Figure below).

      另一个研究是Ziegenhain等人做的,用老鼠胚胎干细胞样品用5种不同的方法做了5次实验,最终一个Svensson等人所作的实验用已知浓度的合成的转录组做实验来研究不同方法的敏感度和准确性。对比了大量的实验,他们也报告了对比不同是实验方法的本质区别,如下图。

      Figure from [Svensson et al.](https://doi.org/10.1038/nmeth.4220), comparing different protocols in relation to their a) accuracy (measured as the Pearson's correlation with bulk RNA-seq data) and b) sensitivity (number of detected molecules).

      Figure 2.4: Figure from Svensson et al., comparing different protocols in relation to their a) accuracy (measured as the Pearson’s correlation with bulk RNA-seq data) and b) sensitivity (number of detected molecules).

      图2.4来自Svensson等人,对比了不同的实验方法。主要是两方面:1准确性(用bulk RNA-seq做的一些相关数据)2灵敏度(根据探测到的分子数量)

      As protocols are developed and improved, and new computational methods for quantifying the technical noise emerge, it is likely that future studies will help us gain further insights regarding the strengths of the different methods. These comparative studies are helpful not only to decide on which protocol to use, but also for developing new methods as the benchmarking makes it possible to determine what strategies are the most useful ones.

      随着方法的升级和进步,新的量化技术计算机处理方案会出现,将来的实验很有可能让我们对这些方法的优势和劣势由了更加深刻的认识。这些对比试验不仅对决定使用哪种方法有用,也对开发新方法有用,因为这些评估让我们更加有可能断定哪些是最有用的方法。

      Besides differences in throughput and sensitivity between protocols, cost may also be a deciding factor when planning a scRNA-seq experiment. It is difficult to precisely estimate how much an experiment will cost, although we point to this tool from the Satija Lab as a starting point: https://satijalab.org/costpercell/. For example, some droplet-based protocols such as Drop-seq are cheaper than the commercial alternatives such as 10x Chromium. However, they require the labs to be equipped to prepare the libraries, as well as trained staff and dedicated time (costing salary money).

      除了方法之间产物量和敏感度之间的区别,当做scRNA-seq实验的时候价格也是一个着重考虑的东西。很难估计具体一个实验要花多少钱,尽管我们指出这个Satija Lab这个东西很实用。比如有些要用到油滴法的方法要比用10X Chromium便宜。然而,他们需要实验室把文库准备好,还有训练实验室的工作人员和工作时间(人工费工资的钱)。

      Methods such as cell hashing (Stoeckius et al.) may further reduce the costs of sequencing using current platforms. This method in particular consists of attaching oligo-tags to cell membranes, allowing more cells from multiple samples to be loaded per experiment, which can later be demultiplexed during the analysis.

      像cell hashing(Stoeckius等人)这样的方法可能在如今的平台上能更进一步地节省成本。这种方法包括在细胞膜上附着寡聚标记,在将来的分析当中可以分离。

      2.5.2 Data Challenges数据挑战

      The main difference between bulk and single cell RNA-seq is that each sequencing library represents a single cell, instead of a population of cells. Therefore, there is no way to have “biological replicates” at a single-cell level: each cell is unique and impossible to replicate. Instead, cells can be clustered by their similarity, and comparisons can then be done across groups of similar cells (as we shall see later in the course).

      bulk和单细胞测序的主要区别是单细胞测序的每一个细胞都是它的测序文库,而不是一群细胞。因此没有办法在单个细胞水平上进行生物学复制。每个细胞都是唯一而不能复刻的。然而,可以根据细胞的共性把他们分成一个个群簇,可以在一群一群相似的细胞之间进行对比(在之后的课程中会讲到)。

      Another big challenge in single-cell RNA-seq is that we have a very low amount of starting material per cell. This results in very sparse data, where most of the genes remain undetected and so our data contains many zeros. These may either be due to the gene not being expressed in the cell (a “real” zero) or the gene was expressed but we were unable to detect it (a “dropout”). This leads to cell-to-cell variation that is not always biological but rather due to technical issues caused by uneven PCR amplification across cells and gene “dropouts” (where a gene is detected in one cell but absent from another (Kharchenko, Silberstein, and Scadden 2014)). Improving the transcript capture efficiency and reducing the amplification bias are solutions for these problems and still active areas of technical research. However, as we shall see in this course, it is possible to alleviate some of these issues through proper data normalisation.

      另一个单细胞测序的很大的挑战是我们一开始每个细胞只有很少的材料。这让数据非常稀少,大对数的基因还是探测不到所以得到的结果很多数值都是0.这有可能是因为基因在这个细胞里没有表达(这是真的0),也有可能是这个基因表达出来了东西但是没能被探测出来(漏掉了)。这导致了细胞之间的编一。这种变异不是生物上的天然变异而是技术导致的“变异”,是不均衡的PCR扩增,一个细胞里面探测到多了一个基因但另一个细胞缺失了那个基因。提升转录捕捉效率减少扩增的偏差是解决这类问题的方案,这还有广阔的技术研究空间。然而正如我们在这门课里见到的那样,可以通过一些数据上的修正来减轻这些问题。

      Another important aspect to take into account are batch effects. These can be observed even when sequencing the same material using different technologies (figure below), and if not properly normalised, can lead to incorrect conclusions.

      另外需要考虑的很重要的一方面是batch effects。这可以用同样的材料、不同的技术来观察实验结果,如下图所示,如果没有适当的校准,也会导致不正确的结论。

      The same cell population was sequenced with three different single-cell protocols (colours). Adapted from Zhang et al..相同的细胞群用3中不同的方法做出来的结果(不同颜色标记)来源于Zhang等人做的实验。

      The processing of samples should also be done in a manner that avoids confounding between experimentally controlled variables (such as a treatment, a genotype or a disease state) and the time when the samples are prepared and sequenced. For example, if planning an experiment to compare healthy and diseased tissues from 10 patients each, if only 10 samples can be processed per day, it is best to do 5 healthy + 5 diseased together each day, rather than prepare all healthy samples one day and all diseased samples in another (figure). Another consideration is to ensure that there is replication of tissue samples. For example, when collecting tissue from an organ, it may be a good idea to take multiple samples from different parts of the organ. Or consider the time of day when samples/replicates are collected (due to possible circadian changes in gene expression). In summary, all the common best practices in experimental design should be taken into account when performing scRNA-seq.

      样品的加工还需要避免混淆很多东西,比如实验控制变量(比如治疗方式、基因型和疾病状态)和样品准备和测序的时间。比如你要对比一个健康的组织和10位病人的病变组织,如果你一天之内得不到10个样品,那你最好每天拿5块健康的和5块病变的组织,而不是一天内准备了很多健康的组织而病变的组织(数据)是在其它时候准备的。另外还要考虑收集重复的样本。比如当收集一个器官的组织时,最好在器管的不同部位收集不同的组织,要考虑一天中样品收集的时间(有些基因表达可能有日夜变化)。总之,做实验之前要考虑到所有的最优方案。

      Illustration of a confounded (top panels) and balanced (bottom panels) designs. Shapes denote different sample types (e.g. tissues or patients) and colours processing batches. In the confounded design it’s impossible to disentangle biological variation from variation due to the processing batch. In the balanced design, by using tissue replicates and mixing them across batches, it is possible to distinguish between biological and batch-related variation. Figure from Hicks et al..

      插图有两种设计。形状表示了不同的样本类型(比如组织或病人),颜色表示了不同的操作组。在综合的设计者,解决实验过程导致的“变异”时不可能的。在均衡的实验中,将重复的组织混合,有可能能够辨别生物本身的变异和食物导致的变异。图源Hicks等人

      2.6 Summary总结

      KEY POINTS要点

      • scRNA-seq is ideally suited to study heterogeneous populations of cells. For example to identify the types of cells that compose a tissue, define “transcriptional fingerprints” for different cell types, study cell differentiation, explore changes in cell composition due to disease or environmental factors, amongst others.
      • A typical sample preparation workflow consists of isolating single cells (or nuclei), converting the RNA into cDNA, preparing a sequencing library (Illumina) and sequencing.
      • Many single-cell protocols have been developed, some openly available, others provided commercially. These mainly differ in their throughput (how many cells are captured per experiment), the type of quantification (full-length or tag-based) and also cost.
      • SMART-seq2 is a popular low-throughput method, providing full-length transcript quantification. It is ideally suited for studying a smaller group of cells in greater detail (e.g. differential isoform usage, characterisation of lowly-expressed transcripts).
      • 10x Chromium is a popular high-throughput method, using UMIs for transcript quantification (from either 3’ or 5’ ends). It is ideally suited to study highly heterogeneous tissues and sample large populations of cells at scale.
      • When planning an experiment, care should be taken to avoid confounding due to batch effects as well as ensuring an adequate level of replication to address questions of interest.
      • scRNA-seq非常适合研究混合细胞群,比如辨别一块组织的细胞组成类型,定义每种细胞类型转录特有的东西,研究细胞差异,探索病理或其它环境因素影响下的细胞成分变化。
      • 通常来说样品的准备流程包括分离细胞/细胞核,把RNA反转录为cDNA,准备测序文库(illumina公司)然后测序。
      • 已经有开发许多反细胞测序的方法。有的大家都能用,有的可商用。它们的主要差别时产品量(每个实验中有多少个细胞被捕获了),量化测序的种类(全长测/标签法测),还有开销。
      • SMART-seq2是一个流行的低产量的方法,使用全长的转录链来量化,非常适合小量细胞群,研究细节性的东西(比如不同的异构体的利用,低表达转录体的特性)。
      • 10x Chromium非常高产,利用独特的分子标记UMI来转录、量化(3‘端和5’端都可以)。这个方法对高度混合的或是规模特别大得到细胞群非常合适。
      • 当打算做实验时,注意不要产生实验批次的问题,也要保证已经转录到一定的水平,才能解决相关问题。

      References参考信息

      Archer, Nathan, Mark D. Walsh, Vahid Shahrezaei, and Daniel Hebenstreit. 2016. “Modeling Enzyme Processivity Reveals That RNA-Seq Libraries Are Biased in Characteristic and Correctable Ways.” Cell Systems 3 (5): 467–479.e12. https://doi.org/10.1016/j.cels.2016.10.012.Kharchenko, Peter V, Lev Silberstein, and David T Scadden. 2014. “Bayesian Approach to Single-Cell Differential Expression Analysis.” Nat Meth 11 (7): 740–42. https://doi.org/10.1038/nmeth.2967.Svensson, Valentine, Kedar Nath Natarajan, Lam-Ha Ly, Ricardo J Miragaia, Charlotte Labalette, Iain C Macaulay, Ana Cvejic, and Sarah A Teichmann. 2017. “Power Analysis of Single-Cell RNA-Sequencing Experiments.” Nat Meth 14 (4): 381–87. https://doi.org/10.1038/nmeth.4220.Tang, Fuchou, Catalin Barbacioru, Yangzhou Wang, Ellen Nordman, Clarence Lee, Nanlan Xu, Xiaohui Wang, et al. 2009. “mRNA-Seq Whole-Transcriptome Analysis of a Single Cell.” Nat Meth 6 (5): 377–82. https://doi.org/10.1038/nmeth.1315.Ziegenhain, Christoph, Beate Vieth, Swati Parekh, Björn Reinius, Amy Guillaumet-Adkins, Martha Smets, Heinrich Leonhardt, Holger Heyn, Ines Hellmann, and Wolfgang Enard. 2017. “Comparative Analysis of Single-Cell RNA Sequencing Methods.” Molecular Cell 65 (4): 631–643.e4. https://doi.org/10.1016/j.molcel.2017.01.023.

      专题:单细胞RNA-seq测序数据分析:

      • 1 About the course 关于单细胞测序跟练课程
      • 2 单细胞RNA-seq介绍
      • 3 Processing Raw scRNA-Seq Sequencing Data: From Reads to a Count Matrix处理scRNA-seq测序的原始数据:把读取的数据转化为计数矩阵
      • 5 scRNA-seq Analysis with Bioconductor
      • 6 Basic Quality Control (QC) and Exploration of scRNA-seq Datasets
      • 7 Biological Analysis
      • 8 Single cell RNA-seq analysis using Seurat
      • 9 scRNA-seq Dataset Integration
      • 10 Resources
      • 11 References
      • 单细胞RNA-seq测序分析-跟练
      • 谈谈单细胞测序那些事儿
      • 【单细胞技术贴】空间转录组与单细胞转录组的整合分析(上篇)
      • 【单细胞技术贴】空间转录组与单细胞转录组的整合分析(下篇)
      • 【单细胞数据分析】SCENIC 从单细胞数据推断基因调控网络和细胞类型

      请关注“恒诺新知”微信公众号,感谢“R语言“,”数据那些事儿“,”老俊俊的生信笔记“,”冷🈚️思“,“珞珈R”,“生信星球”的支持!

      • 分享:
      作者头像
      xu, xintian

      上一篇文章

      1 About the course 关于单细胞测序跟练课程
      2021年9月9日

      下一篇文章

      3 Processing Raw scRNA-Seq Sequencing Data: From Reads to a Count Matrix处理scRNA-seq测序的原始数据:把读取的数据转化为计数矩阵
      2021年9月9日

      你可能也喜欢

      3-1675088138
      Nature | 单细胞技术揭示衰老细胞与肌肉再生
      30 1月, 2023
      2-1675088548
      lncRNA和miRNA生信分析系列讲座免费视频课和课件资源包,干货满满
      30 1月, 2023
      9-1675131201
      如何快速批量修改 Git 提交记录中的用户信息
      26 1月, 2023

      留言 取消回复

      要发表评论,您必须先登录。

      搜索

      分类

      • R语言
      • TCGA数据挖掘
      • 单细胞RNA-seq测序
      • 在线会议直播预告与回放
      • 数据分析那些事儿分类
      • 未分类
      • 生信星球
      • 老俊俊的生信笔记

      投稿培训

      免费

      alphafold2培训

      免费

      群晖配置培训

      免费

      最新博文

      Nature | 单细胞技术揭示衰老细胞与肌肉再生
      301月2023
      lncRNA和miRNA生信分析系列讲座免费视频课和课件资源包,干货满满
      301月2023
      如何快速批量修改 Git 提交记录中的用户信息
      261月2023
      logo-eduma-the-best-lms-wordpress-theme

      (00) 123 456 789

      weinfoadmin@weinformatics.cn

      恒诺新知

      • 关于我们
      • 博客
      • 联系
      • 成为一名讲师

      链接

      • 课程
      • 事件
      • 展示
      • 问答

      支持

      • 文档
      • 论坛
      • 语言包
      • 发行状态

      推荐

      • iHub汉语代码托管
      • iLAB耗材管理
      • WooCommerce
      • 丁香园论坛

      weinformatics 即 恒诺新知。ICP备案号:粤ICP备19129767号

      • 关于我们
      • 博客
      • 联系
      • 成为一名讲师

      要成为一名讲师吗?

      加入数以千计的演讲者获得100%课时费!

      现在开始

      用你的站点账户登录

      忘记密码?

      还不是会员? 现在注册

      注册新帐户

      已经拥有注册账户? 现在登录

      close
      会员购买 你还没有登录,请先登录
      • ¥99 VIP-1个月
      • ¥199 VIP-半年
      • ¥299 VIP-1年
      在线支付 激活码

      立即支付
      支付宝
      微信支付
      请使用 支付宝 或 微信 扫码支付
      登录
      注册|忘记密码?