Run Canek on a toy example

library(Canek)

# Functions
## Function to plot the pca coordinates
plotPCA <- function(pcaData = NULL, label = NULL, legPosition = "topleft"){
  col <- as.integer(label) 
  plot(x = pcaData[,"PC1"], y = pcaData[,"PC2"],
       col = as.integer(label), cex = 0.75, pch = 19,
       xlab = "PC1", ylab = "PC2")
  legend(legPosition,  pch = 19,
         legend = levels(label), 
         col =  unique(as.integer(label)))
}

Load the data

On this toy example we use the two simulated batches included in the SimBatches data from Canek’s package. SimBatches is a list containing:

  • batches: Simulated scRNA-seq datasets with genes (rows) and cells (columns). Simulations were performed using Splatter.
  • cell_type: a factor containing the celltype labels of the batches
lsData <- list(B1 = SimBatches$batches[[1]], B2 = SimBatches$batches[[2]])
batch <- factor(c(rep("Batch-1", ncol(lsData[[1]])),
                  rep("Batch-2", ncol(lsData[[2]]))))
celltype <- SimBatches$cell_types
table(batch)
#> batch
#> Batch-1 Batch-2 
#>     631     948
table(celltype)
#> celltype
#> Cell Type 1 Cell Type 2 Cell Type 3 Cell Type 4 
#>        1451          53          38          37

PCA before correction

We perform the Principal Component Analysis (PCA) of the joined datasets and scatter plot the first two PCs. The batch-effect causes cells to group by batch.

data <- Reduce(cbind, lsData)
pcaData <- prcomp(t(data), center = TRUE, scale. = TRUE)$x
plotPCA(pcaData = pcaData, label = batch, legPosition = "bottomleft")

plotPCA(pcaData = pcaData, label = celltype, legPosition = "bottomleft")

Run Canek

We correct the toy batches using the function RunCanek. This function accepts:

  • List of matrices
  • Seurat object
  • List of Seurat objects
  • SingleCellExperiment object
  • List of SingleCellExperiment objects

On this example we use the list of matrices created before.

data <- RunCanek(lsData)

PCA after correction

We perform PCA of the corrected datasets and plot the first two PCs. After correction, the cells group by their corresponding cell type.

pcaData <- prcomp(t(data), center = TRUE, scale. = TRUE)$x
plotPCA(pcaData = pcaData, label = batch, legPosition = "topleft")

plotPCA(pcaData = pcaData, label = celltype, legPosition = "topleft")

Session info

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] Canek_0.2.5    rmarkdown_2.29
#> 
#> loaded via a namespace (and not attached):
#>  [1] sass_0.4.9          generics_0.1.3      class_7.3-23       
#>  [4] robustbase_0.99-4-1 lattice_0.22-6      numbers_0.8-5      
#>  [7] digest_0.6.37       magrittr_2.0.3      evaluate_1.0.3     
#> [10] grid_4.4.2          fastmap_1.2.0       jsonlite_1.9.0     
#> [13] Matrix_1.7-2        nnet_7.3-20         mclust_6.1.1       
#> [16] kernlab_0.9-33      codetools_0.2-20    modeltools_0.2-23  
#> [19] jquerylib_0.1.4     cli_3.6.4           rlang_1.1.5        
#> [22] BiocNeighbors_2.1.2 cachem_1.1.0        yaml_2.3.10        
#> [25] fpc_2.2-13          FNN_1.1.4.1         tools_4.4.2        
#> [28] flexmix_2.3-19      parallel_4.4.2      BiocParallel_1.41.2
#> [31] BiocGenerics_0.53.6 buildtools_1.0.0    R6_2.6.1           
#> [34] matrixStats_1.5.0   stats4_4.4.2        lifecycle_1.0.4    
#> [37] S4Vectors_0.45.4    bluster_1.17.0      MASS_7.3-64        
#> [40] irlba_2.3.5.1       cluster_2.1.8       pkgconfig_2.0.3    
#> [43] bslib_0.9.0         Rcpp_1.0.14         DEoptimR_1.1-3-1   
#> [46] xfun_0.51           prabclus_2.3-4      sys_3.4.3          
#> [49] knitr_1.49          htmltools_0.5.8.1   igraph_2.1.4       
#> [52] maketools_1.3.2     compiler_4.4.2      diptest_0.77-1