【单细胞数据分析】SCENIC 从单细胞数据推断基因调控网络和细胞类型
专题介绍:单细胞RNA-seq被评为2018年重大科研进展,但实际上这是老技术。2015年,商品化单细胞RNA测序流程已经建立,成果发表在Cell上。今年井喷式发文章,关注点那么高,是因为最近这项技术全面商品化了。
10k PBMCs (10x Genomics)数据
他的教程展示了使用单个样本分析典型 scRNA-seq 数据集的步骤。我们将使用 10x Genomics 支持网站提供的 PBMC 数据。该管道的 DSL1 版本中使用了相同的数据集,在 SCENIC 协议教程(此处)中进行了描述。e SCENIC protocol tutorial (here).
准备10x的输入数据
“特征/细胞矩阵(过滤)”数据是从 10x Genomics 下载的,这里。 here.
wget http://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_10k_v3/pbmc_10k_v3_filtered_feature_bc_matrix.tar.gz
当使用 10x 数据作为输入时,管道假定文件位于典型的 Cell Ranger 目录结构中。从 10x 网站下载处理过的计数时,情况并非如此,因此我们将它们放入正确的格式:
mkdir -p pbmc10k/outs/ tar xvf pbmc_10k_v3_filtered_feature_bc_matrix.tar.gz -C pbmc10k/outs/
结果在这里:
$ tree pbmc10k pbmc10k └── outs └── filtered_feature_bc_matrix ├── barcodes.tsv.gz ├── features.tsv.gz └── matrix.mtx.gz 2 directories, 3 files
因此,在下一步生成的 nextflow 配置文件中,tenx 输入通道应指向 outs 文件夹。例如:
params.data.tenx.cellranger_mex = '/home/cflerin/analysis/pbmc10k/dsl2_0.19.0/pbmc10k/outs'
Setup the VSN-pipelines project
Update the repository
拉取/更新 nextflow 缓存的 vsn-pipelines 存储库。在这里,我们使用 -r 标志来指定要使用的管道版本:
nextflow pull vib-singlecell-nf/vsn-pipelines
Build the config file
We use a combination of profiles to build the config file:
tenx
: defines the input data typesingle_sample_scenic
: loads the basic parameters to run the single_sample and scenic workflowsscenic_use_cistarget_motifs
andscenic_use_cistarget_tracks
: includes parameters to specify the location of the cistarget database fileshg38
: specifies the genome. Other options are:hg19
,dm6
,mm10
.singularity
(ordocker
): specifies container system to use to run the processes
nextflow config vib-singlecell-nf/vsn-pipelines \ -profile tenx,single_sample_scenic,scenic_use_cistarget_motifs,scenic_use_cistarget_tracks,hg38,singularity \ > pbmc10k.vsn-pipelines.complete.config
Important variables to check in the config:
singularity.runOptions
(ordocker.runOptions
): making sure the correct volume mounts are specified (requires the user home folder (included by default in Singularity), and the location of the data).params.global.project_name
(optional): will control the naming of the output files.params.sc.scope.tree.level_${X}
(optional): controls the labeling of the loom file when uploaded to the SCope viewer.params.sc.scanpy.filter
: filtering settings for the Scanpy steps.params.sc.scanpy.feature_selection
: controls how highly variable genes are selected.params.sc.scanpy.clustering
: controls cluster settings. In the example here, we select two clustering resolutions by usingresolutions = [0.4,0.8]
.
Specifying compute resource usage in the config:
- The global executor (
process.executor
) is set tolocal
by default. It can be changed toqsub
, etc. to run specific processes as jobs. The executor parameter can be added to specific labels to run only these processes as jobs. Typically the GRN step should be submitted as a job (compute_resources__scenic_grn
). - The number of cpus and memory usage can be adjusted for each label.
The complete config file used here is available at: pbmc10k/pbmc10k.vsn-pipelines.complete.config.
Run the VSN-pipelines project
First pass
虽然总体目标是将“最佳实践步骤”和 SCENIC 一起运行,但我们可以先跳过运行 SCENIC,专注于使过滤和预处理步骤正确。然后,我们可以继续运行资源密集型 SCENIC 步骤。即使我们创建了一个包含 single_sample 和scenery 选项的配置文件,我们也可以先运行 single_sample 工作流:
nextflow -C pbmc10k.vsn-pipelines.complete.config \ run vib-singlecell-nf/vsn-pipelines \ -entry single_sample
Now, the QC reports can be inspected (see out/notebooks/intermediate/pbmc10k.SC_QC_filtering_report.html
, either the original ipynb, or the converted html file). The cell and gene filters can be updated by editing the config file. For example, the relevant filters used here are:
params { sc { scanpy { filter = { cellFilterMinNGenes = 200 cellFilterMaxNGenes = 4000 cellFilterMaxPercentMito = 0.15 geneFilterMinNCells = 3 } } } }
Re-run the pipeline as many times as needed (with resume
to skip alread-completed steps) to select the proper filters:
nextflow -C pbmc10k.vsn-pipelines.complete.config \ run vib-singlecell-nf/vsn-pipelines \ -entry single_sample
Second pass
一旦细胞和基因过滤器看起来正常,我们就可以在启用完整 SCENIC 步骤的情况下重新启动管道。这将重新运行参数已更改的任何步骤(例如过滤和下游步骤),同时在使用 -resume 选项时跳过初始转换等:
nextflow -C pbmc10k.vsn-pipelines.complete.config \ run vib-singlecell-nf/vsn-pipelines \ -entry single_sample_scenic \ -resume
Results
管道完成后(在 HPC 系统上使用 15 个进程进行 SCENIC GRN 步骤大约需要 2 小时),输出将是以下文件(显示被截断):
$ tree out out/ ├── data │ ├── intermediate │ │ └── [...] │ └── pbmc10k.PBMC10k_DSL2.single_sample.output.h5ad ├── loom │ ├── pbmc10k.SCENIC_SCope_output.loom │ └── pbmc10k.SCope_output.loom ├── nextflow_reports │ ├── execution_report.html │ ├── execution_timeline.html │ ├── execution_trace.txt │ └── pipeline_dag.dot ├── notebooks │ ├── intermediate │ ├── pbmc10k.merged_report.html │ ├── pbmc10k.merged_report.ipynb │ ├── pbmc10k.merged_report.louvain_0.4.html │ ├── pbmc10k.merged_report.louvain_0.4.ipynb │ ├── pbmc10k.merged_report.louvain_0.8.html │ └── pbmc10k.merged_report.louvain_0.8.ipynb └── scenic └── pbmc10k ├── arboreto_with_multiprocessing │ ├── pbmc10k__adj.tsv │ └── pbmc10k.filtered.loom ├── aucell │ ├── pbmc10k__auc_mtf.loom │ ├── pbmc10k__auc_trk.loom │ └── pbmc10k.filtered.loom ├── cistarget │ ├── pbmc10k.filtered.loom │ ├── pbmc10k__reg_mtf.csv │ └── pbmc10k__reg_trk.csv ├── notebooks │ ├── SCENIC_report.html │ └── SCENIC_report.ipynb ├── SCENIC_output.loom └── SCENIC_SCope_output.loom
The final SCENIC output is packaged into a loom file, which includes the results of the parallel expression analysis (based on highly variable genes). This can be found at out/loom/pbmc10k.SCENIC_SCope_output.loom
, and is ready to be uploaded to a SCope session. The output loom file from this analysis can be found on the SCENIC protocol SCope session.
Also included is out/data/pbmc10k.PBMC10k_DSL2.single_sample.output.h5ad
, an anndata file generated by the Scanpy section of the pipeline, including the results of the expression analysis (but not results from SCENIC).
专题:单细胞RNA-seq测序数据分析:
- 1 About the course 关于单细胞测序跟练课程
- 2 单细胞RNA-seq介绍
- 3 Processing Raw scRNA-Seq Sequencing Data: From Reads to a Count Matrix处理scRNA-seq测序的原始数据:把读取的数据转化为计数矩阵
- 5 scRNA-seq Analysis with Bioconductor
- 6 Basic Quality Control (QC) and Exploration of scRNA-seq Datasets
- 7 Biological Analysis
- 8 Single cell RNA-seq analysis using Seurat
- 9 scRNA-seq Dataset Integration
- 10 Resources
- 11 References
- 单细胞RNA-seq测序分析-跟练
- 谈谈单细胞测序那些事儿
- 【单细胞技术贴】空间转录组与单细胞转录组的整合分析(上篇)
- 【单细胞技术贴】空间转录组与单细胞转录组的整合分析(下篇)
- 【单细胞数据分析】SCENIC 从单细胞数据推断基因调控网络和细胞类型
请关注“恒诺新知”微信公众号,感谢“R语言“,”数据那些事儿“,”老俊俊的生信笔记“,”冷🈚️思“,“珞珈R”,“生信星球”的支持!