• 主页
  • 课程

    关于课程

    • 课程归档
    • 成为一名讲师
    • 讲师信息
    同等学历教学

    同等学历教学

    免费
    阅读更多
  • 特色
    • 展示
    • 关于我们
    • 问答
  • 事件
  • 个性化
  • 博客
  • 联系
  • 站点资源
    有任何问题吗?
    (00) 123 456 789
    weinfoadmin@weinformatics.cn
    注册登录
    恒诺新知
    • 主页
    • 课程

      关于课程

      • 课程归档
      • 成为一名讲师
      • 讲师信息
      同等学历教学

      同等学历教学

      免费
      阅读更多
    • 特色
      • 展示
      • 关于我们
      • 问答
    • 事件
    • 个性化
    • 博客
    • 联系
    • 站点资源

      未分类

      • 首页
      • 博客
      • 未分类
      • R/Seurat包对数据处理03-b数据整合分析

      R/Seurat包对数据处理03-b数据整合分析

      • 发布者 weinfoauthor
      • 分类 未分类
      • 日期 2021年8月11日
      • 评论 0评论

      回想起来自己从事生物相关的研究已经大概15年了,从研究生进入实验室也有10年时间,陆续从硕士,博士到博后,研究地点也从化学学院,到药学院再到医院科室。自己做的研究是“干-湿”实验结合的,发表的成果也是各自一半,但是综合起来还是生物信息分析的文章发表的影响因子高一些。到现在由于工作场所频繁发生变化,反而没有稳定的场所做实验,所以愈发的在生物信息方面下较多的功夫。因此我对这十几年来的生信研究进行总结,希望帮助新手克服生物信息陡峭的学习曲线,当然我自己也不是科班出身的,也希望与你一起交流学习。所有的内容均是以自己的实验数据(会明确下载地址给读者)操作来进行,避免某些在demo运行很好却在自己的环境中出现bug的情况。最后一点,现在通讯太发达了,欢迎大家与我直接交流共同进步。
      9 scRNA-seq Dataset Integration | Analysis of single cell RNA-seq data
      大部分的情况下我们会碰到需要将数据整合分析的场景,比如不同批次测的同一个样本的单细胞测序,这样做的好处是可以获得更多的细胞数量方便分析获得更多的数据。

      library(Seurat)
      library(SeuratDisk)
      library(SeuratWrappers)
      library(patchwork)
      library(harmony)
      library(rliger)
      library(reshape2)
      library(RColorBrewer)
      library(dplyr)
      # 获取数据
      download.file("https://cf.10xgenomics.com/samples/cell-exp/4.0.0/Parent_NGSC3_DI_PBMC/Parent_NGSC3_DI_PBMC_filtered_feature_bc_matrix.h5",
                    destfile = "3p_pbmc10k_filt.h5")
      download.file("https://cf.10xgenomics.com/samples/cell-vdj/5.0.0/sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_filtered_feature_bc_matrix.h5",
                    destfile = "5p_pbmc10k_filt.h5")
      # Let’s read the data and create the appropriate Seurat objects. Note that 5’ dataset has other assays - namely, VDJ data.
      matrix_3p <- Read10X_h5("data/update/3p_pbmc10k_filt.h5",use.names = T)
      matrix_5p <- Read10X_h5("data/update/5p_pbmc10k_filt.h5",use.names = T)$`Gene Expression`
      srat_3p   <- CreateSeuratObject(matrix_3p,project = "pbmc10k_3p")
      srat_5p   <- CreateSeuratObject(matrix_5p,project = "pbmc10k_5p")
      # 节省内存移除一些加载
      rm(matrix_3p)
      rm(matrix_5p)
      # Let’s calculate the fractions of mitochondrial genes and ribosomal proteins, and do quick-and-dirty filtering of the datasets:
      srat_3p[["percent.mt"]]  <- PercentageFeatureSet(srat_3p, pattern = "^MT-")
      srat_3p[["percent.rbp"]] <- PercentageFeatureSet(srat_3p, pattern = "^RP[SL]")
      srat_5p[["percent.mt"]]  <- PercentageFeatureSet(srat_5p, pattern = "^MT-")
      srat_5p[["percent.rbp"]] <- PercentageFeatureSet(srat_5p, pattern = "^RP[SL]")
      # 可视化结果
      VlnPlot(srat_3p, features = c("nFeature_RNA","nCount_RNA","percent.mt","percent.rbp"), ncol = 4)
      VlnPlot(srat_5p, features = c("nFeature_RNA","nCount_RNA","percent.mt","percent.rbp"), ncol = 4)
      # 需要对比基因名字是否一致
      table(rownames(srat_3p) %in% rownames(srat_5p)) 
      ## 
      ##  TRUE 
      ## 36601
      # Quick filtering of the datasets removes dying cells and putative doublets:
      srat_3p <- subset(srat_3p, subset = nFeature_RNA > 500 & nFeature_RNA < 5000 & percent.mt < 15)
      srat_5p <- subset(srat_5p, subset = nFeature_RNA > 500 & nFeature_RNA < 5000 & percent.mt < 10)
      # 然后执行正态化以及后续的分析
      pbmc_list <- list()
      pbmc_list[["pbmc10k_3p"]] <- srat_3p
      pbmc_list[["pbmc10k_5p"]] <- srat_5p
      for (i in 1:length(pbmc_list)) {
        pbmc_list[[i]] <- NormalizeData(pbmc_list[[i]], verbose = F)
        pbmc_list[[i]] <- FindVariableFeatures(pbmc_list[[i]], selection.method = "vst", nfeatures = 2000, verbose = F)
      }
      # After this, we use the two following Seurat commands to find integration anchors and actually perform integration. This takes about 10 min:
      pbmc_anchors    <- FindIntegrationAnchors(object.list = pbmc_list, dims = 1:30)
      ## Warning in CheckDuplicateCellNames(object.list = object.list): Some cell names
      ## are duplicated across objects provided. Renaming to enforce unique cell names.
      ## Computing 2000 integration features
      ## Scaling features for provided objects
      ## Finding all pairwise anchors
      ## Running CCA
      ## Merging objects
      ## Finding neighborhoods
      ## Finding anchors
      ##  Found 21626 anchors
      ## Filtering anchors
      ##  Retained 8832 anchors
      pbmc_seurat     <- IntegrateData(anchorset = pbmc_anchors, dims = 1:30)
      ## Merging dataset 1 into 2
      ## Extracting anchors for merged samples
      ## Finding integration vectors
      ## Finding integration vector weights
      ## Integrating data
      # Let’s remove all the datastructures we’re not using to save the RAM:
      rm(pbmc_list)
      rm(pbmc_anchors)
      # Seurat integration creates a unified object that contains both original data (‘RNA’ assay) as well as integrated data (‘integrated’ assay). Let’s set the assay to RNA and visualize the datasets before integration.
      DefaultAssay(pbmc_seurat) <- "RNA"
      # Let’s do normalization, HVG finding, scaling, PCA, and UMAP on the un-integrated (RNA) assay:
      pbmc_seurat <- NormalizeData(pbmc_seurat, verbose = F)
      pbmc_seurat <- FindVariableFeatures(pbmc_seurat, selection.method = "vst", nfeatures = 2000, verbose = F)
      pbmc_seurat <- ScaleData(pbmc_seurat, verbose = F)
      pbmc_seurat <- RunPCA(pbmc_seurat, npcs = 30, verbose = F)
      pbmc_seurat <- RunUMAP(pbmc_seurat, reduction = "pca", dims = 1:30, verbose = F)
      ## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
      ## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
      ## This message will be shown once per session
      # UMAP plot of the datasets before integration shows clear separation. Note that we can use patchwork syntax with Seurat plotting functions:
      DimPlot(pbmc_seurat,reduction = "umap") + plot_annotation(title = "10k 3' PBMC and 10k 5' PBMC cells, before integration")
      # 发现未整合的数据靠的分厂的远
      # Now let’s change the assay to integrated and do the same do the same thing in the integrated assay (it’s already normalized and HVGs are selected):
      DefaultAssay(pbmc_seurat) <- "integrated"
      pbmc_seurat <- ScaleData(pbmc_seurat, verbose = F)
      pbmc_seurat <- RunPCA(pbmc_seurat, npcs = 30, verbose = F)
      pbmc_seurat <- RunUMAP(pbmc_seurat, reduction = "pca", dims = 1:30, verbose = F)
      # Finally, let’s plot the integrated UMAP: 就发现很好的整合了
      DimPlot(pbmc_seurat, reduction = "umap") + plot_annotation(title = "10k 3' PBMC and 10k 5' PBMC cells, after integration (Seurat 3)")
      # The data are visibly very nicely integrated. Let’s try a split plot, which should make the comparison easier:
      DimPlot(pbmc_seurat, reduction = "umap", split.by = "orig.ident") + NoLegend()
      # Now let’s cluster the integrated matrix and look how clusters are distributed between the two sets:
      pbmc_seurat <- FindNeighbors(pbmc_seurat, dims = 1:30, k.param = 10, verbose = F)
      pbmc_seurat <- FindClusters(pbmc_seurat, verbose = F)
      DimPlot(pbmc_seurat,label = T) + NoLegend()
      # We can now calculate the number of cells in each cluster that came for either 3’ or the 5’ dataset:
      count_table <- table(pbmc_seurat@meta.data$seurat_clusters, pbmc_seurat@meta.data$orig.ident)
      count_table
      ##     
      ##      pbmc10k_3p pbmc10k_5p
      ##   0        1313       2140
      ##   1        1427        945
      ##   2        1214        994
      ##   3         898       1110
      ##   4         612        769
      ##   5         576        437
      ##   6         401        603
      ##   7         338        646
      ##   8         311        403
      ##   9         359        223
      ##   10        251        292
      ##   11        347        131
      ##   12        202        240
      ##   13        136        288
      ##   14        251         61
      ##   15        109        190
      ##   16        118        143
      ##   17         76         92
      ##   18         93         61
      ##   19         55         87
      ##   20         35         35
      ##   21         24         43
      ##   22         13         40
      ##   23         15         10
      ##   24         14          5
      ##   25          2         12
      # Let’s plot the distribution among clusters using our custom function:
      plot_integrated_clusters(pbmc_seurat) 
      ## Using cluster as id variables
      # 以下命令删除并且进行下一步的hrmony整合
      rm(pbmc_seurat)
      # Harmony, 3’ vs 5’ 10k PBMC
      pbmc_harmony    <- merge(srat_3p,srat_5p)
      ## Warning in CheckDuplicateCellNames(object.list = objects): Some cell names are
      ## duplicated across objects provided. Renaming to enforce unique cell names.
      # Now let’s do the same as we did before:
      pbmc_harmony <- NormalizeData(pbmc_harmony, verbose = F)
      pbmc_harmony <- FindVariableFeatures(pbmc_harmony, selection.method = "vst", nfeatures = 2000, verbose = F)
      pbmc_harmony <- ScaleData(pbmc_harmony, verbose = F)
      pbmc_harmony <- RunPCA(pbmc_harmony, npcs = 30, verbose = F)
      pbmc_harmony <- RunUMAP(pbmc_harmony, reduction = "pca", dims = 1:30, verbose = F)
      # Let’s plot the “before” plot again: 先可视化非整合的数据
      DimPlot(pbmc_harmony,reduction = "umap") + plot_annotation(title = "10k 3' PBMC and 10k 5' PBMC cells, before integration")
      # Use RunHarmony to run it on the combined Seurat object using orig.ident as batch:
      pbmc_harmony <- pbmc_harmony %>% RunHarmony("orig.ident", plot_convergence = T)
      ## Harmony 1/10
      ## Harmony 2/10
      ## Harmony 3/10
      ## Harmony 4/10
      ## Harmony 5/10
      ## Harmony converged after 5 iterations
      ## Warning: Invalid name supplied, making object name syntactically valid. New
      ## object name is Seurat..ProjectDim.RNA.harmony; see ?make.names for more details
      ## on syntax validity
      # Check the generated embeddings:
      harmony_embeddings <- Embeddings(pbmc_harmony, 'harmony')
      harmony_embeddings[1:5, 1:5]
      ##                       harmony_1 harmony_2 harmony_3  harmony_4 harmony_5
      ## AAACCCACATAACTCG-1_1  -9.206607 -2.351619 -2.374652  -1.897186 -1.011885
      ## AAACCCACATGTAACC-1_1   7.124223 21.600131 -0.292039   1.530283 -5.792142
      ## AAACCCAGTGAGTCAG-1_1 -18.134725  3.405369  5.256459   4.220001  3.961466
      ## AAACGAACAGTCAGTT-1_1 -18.103262 15.279955 12.301681 -18.115094 31.785955
      ## AAACGAACATTCGGGC-1_1  11.097966 -2.330278 -2.723953   1.546468  1.552332
      # Check the PCA plot after:
      p1 <- DimPlot(object = pbmc_harmony, reduction = "harmony", pt.size = .1, group.by = "orig.ident") + NoLegend()
      p2 <- VlnPlot(object = pbmc_harmony, features = "harmony_1", group.by = "orig.ident", pt.size = .1) + NoLegend()
      plot_grid(p1,p2)
      ## 可以看到pca和voil图很好的一致
      # Do UMAP and clustering:
      pbmc_harmony <- pbmc_harmony %>% 
        RunUMAP(reduction = "harmony", dims = 1:30, verbose = F) %>% 
        FindNeighbors(reduction = "harmony", k.param = 10, dims = 1:30) %>% 
        FindClusters() %>% 
        identity()
      ## Computing nearest neighbor graph
      ## Computing SNN
      ## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
      ## 
      ## Number of nodes: 19190
      ## Number of edges: 280811
      ## 
      ## Running Louvain algorithm...
      ## Maximum modularity in 10 random starts: 0.8952
      ## Number of communities: 26
      ## Elapsed time: 3 seconds
      # Finally, so same UMAP plots of integrated datasets as above:
      pbmc_harmony <- SetIdent(pbmc_harmony,value = "orig.ident")
      DimPlot(pbmc_harmony,reduction = "umap") + plot_annotation(title = "10k 3' PBMC and 10k 5' PBMC cells, after integration (Harmony)")
      # 以下命令分开展示
      DimPlot(pbmc_harmony, reduction = "umap", group.by = "orig.ident", pt.size = .1, split.by = 'orig.ident') + NoLegend()
      # These look a bit worse than ones from Seurat:结果看起来不好
      pbmc_harmony <- SetIdent(pbmc_harmony,value = "seurat_clusters")
      DimPlot(pbmc_harmony,label = T) + NoLegend()
      # Finally, let’s take a look at the cluster content:
      plot_integrated_clusters(pbmc_harmony)
      ## Using cluster as id variables
      # 以下命令移除然后使用新的方法
      rm(pbmc_harmony)
      # 后续不做太多赘述,一般都是要尝试seurat harmony liger三种方法来找到最合适的合并方法,然后进行分析

      最后,如果感兴趣请加微信:cll7658,如果你有觉得比较重要的讨论即可。

      请关注“恒诺新知”微信公众号,感谢“R语言“,”数据那些事儿“,”老俊俊的生信笔记“,”冷🈚️思“,“珞珈R”,“生信星球”的支持!

      标签:单细胞测序

      • 分享:
      weinfoauthor
      weinfoauthor

      1233

      上一篇文章

      pre01-scrublet检测双细胞
      2021年8月11日

      下一篇文章

      Github最近更新了密码验证,你遇到了吗?
      2021年8月20日

      你可能也喜欢

      2-1675088548
      lncRNA和miRNA生信分析系列讲座免费视频课和课件资源包,干货满满
      30 1月, 2023
      9-1675131201
      如何快速批量修改 Git 提交记录中的用户信息
      26 1月, 2023
      8-1678501786
      肿瘤细胞通过改变CD8+ T细胞中的丙酮酸利用和琥珀酸信号来调控抗肿瘤免疫应答。
      7 12月, 2022

      留言 取消回复

      要发表评论,您必须先登录。

      搜索

      分类

      • R语言
      • TCGA数据挖掘
      • 单细胞RNA-seq测序
      • 在线会议直播预告与回放
      • 数据分析那些事儿分类
      • 未分类
      • 生信星球
      • 老俊俊的生信笔记

      投稿培训

      免费

      alphafold2培训

      免费

      群晖配置培训

      免费

      最新博文

      Nature | 单细胞技术揭示衰老细胞与肌肉再生
      301月2023
      lncRNA和miRNA生信分析系列讲座免费视频课和课件资源包,干货满满
      301月2023
      如何快速批量修改 Git 提交记录中的用户信息
      261月2023
      logo-eduma-the-best-lms-wordpress-theme

      (00) 123 456 789

      weinfoadmin@weinformatics.cn

      恒诺新知

      • 关于我们
      • 博客
      • 联系
      • 成为一名讲师

      链接

      • 课程
      • 事件
      • 展示
      • 问答

      支持

      • 文档
      • 论坛
      • 语言包
      • 发行状态

      推荐

      • iHub汉语代码托管
      • iLAB耗材管理
      • WooCommerce
      • 丁香园论坛

      weinformatics 即 恒诺新知。ICP备案号:粤ICP备19129767号

      • 关于我们
      • 博客
      • 联系
      • 成为一名讲师

      要成为一名讲师吗?

      加入数以千计的演讲者获得100%课时费!

      现在开始

      用你的站点账户登录

      忘记密码?

      还不是会员? 现在注册

      注册新帐户

      已经拥有注册账户? 现在登录

      close
      会员购买 你还没有登录,请先登录
      • ¥99 VIP-1个月
      • ¥199 VIP-半年
      • ¥299 VIP-1年
      在线支付 激活码

      立即支付
      支付宝
      微信支付
      请使用 支付宝 或 微信 扫码支付
      登录
      注册|忘记密码?