• Overview
  • FASTQ+
  • PISA
  • Yano
  • My GitHub Page
  • Discussions

Spatial dissimilarity analysis for single-cell trajectories and supervised embeddings

On this page

  • Prepare raw data
  • Cell clustering and annotation
  • Cell development trajetory construction
  • Perform alternative splicing analysis along the cell trajetory
  • Perform alternative splicing on a supervised two dimension space

In the previous vignette, we performed cell clustering using unsupervised methods. Typically, a PCA or integrated space (e.g., Harmony) is generated during this process, and we then calculate the cell weight matrix directly from the cell embedding space. This approach is convenient for running the pipeline, and provide the general result in all cell population. Furthermore, for specific signaling or biological pathways, it is often more informative to focus on a ‘supervised’ cell ordering. This vignette demonstrates how to perform spatial dissimilarity test in low-dimensional spaces, such as cell trajectory.

Prepare raw data

We used the human testis single-cell atlas dataset for this analysis. The full dataset is publicly available at GSE112013. For this vignette, we only used replicate 1.

## We first download the BAM files and GTF file
wget -c https://sra-pub-src-1.s3.amazonaws.com/SRR6860519/14515X1.bam.1 -O donor1_rep1.bam
wget -c https://sra-pub-src-1.s3.amazonaws.com/SRR6860520/14606X1.bam.1 -O donor1_rep2.bam

## Please Note: we use a different annotation file from previos vignette, because the BAM file use NCBI style chromosome name here.
wget https://ftp.ensembl.org/pub/release-114/gtf/homo_sapiens/Homo_sapiens.GRCh38.114.chr.gtf.gz

## Index the bam files for track plots
samtools index donor1_rep1.bam
samtools index donor1_rep2.bam

## Then we use PISA to generate feature counts
PISA anno -gtf Homo_sapiens.GRCh38.114.chr.gtf.gz -exon -psi -o d1_rep2_anno.bam donor1_rep2.bam
PISA anno -gtf Homo_sapiens.GRCh38.114.chr.gtf.gz -exon -psi -o d1_rep1_anno.bam donor1_rep1.bam

mkdir exp1
mkdir exp2
mkdir exon1
mkdir exon2
mkdir junction1
mkdir junction2
mkdir exclude1
mkdir exclude2

PISA count -tag CB -umi UB -anno-tag GN -outdir exp1 d1_rep1_anno.bam
PISA count -tag CB -umi UB -anno-tag GN -outdir exp2 d1_rep2_anno.bam
PISA count -tag CB -umi UB -anno-tag EX -outdir exon1 d1_rep1_anno.bam
PISA count -tag CB -umi UB -anno-tag EX -outdir exon2 d1_rep2_anno.bam
PISA count -tag CB -umi UB -anno-tag JC -outdir junction1 d1_rep1_anno.bam
PISA count -tag CB -umi UB -anno-tag JC -outdir junction2 d1_rep2_anno.bam

# Note: Although we generate the excluded reads here, we do not use them in this demonstration for the sake of simplicity. However, it is recommended to test them in real cases, so I’ve kept the data available for you to use in your own analyses.
PISA count -tag CB -umi UB -anno-tag ER -outdir exclude1 d1_rep1_anno.bam
PISA count -tag CB -umi UB -anno-tag ER -outdir exclude2 d1_rep2_anno.bam

Cell clustering and annotation

require(Seurat)
require(ggplot2)
require(dplyr)
require(Yano)
Warning: failed to assign RegisteredNativeSymbol for anno_conseq to anno_conseq
since anno_conseq is already defined in the 'Yano' namespace
Warning: failed to assign RegisteredNativeSymbol for anno_gene to anno_gene
since anno_gene is already defined in the 'Yano' namespace
Warning: failed to assign RegisteredNativeSymbol for anno_vcf to anno_vcf since
anno_vcf is already defined in the 'Yano' namespace
Warning: failed to assign RegisteredNativeSymbol for gtf2db to gtf2db since
gtf2db is already defined in the 'Yano' namespace
Warning: failed to assign RegisteredNativeSymbol for lognorm to lognorm since
lognorm is already defined in the 'Yano' namespace
exp1 <- ReadPISA("exp1/", prefix = "d1-rep1-")
exp2 <- ReadPISA("exp2/", prefix = "d1-rep2-")

exp <- mergeMatrix(exp1, exp2)

obj <- QuickRecipe(exp, min.features = 800, min.cells = 100, resolution=0.2)
rm(exp1, exp2, exp)

obj$replicate <- gsub("d1-(.*)-.*-1", "\\1", colnames(obj))
DimPlot(obj, label = TRUE, label.size = 5, group.by = 'seurat_clusters')

FeaturePlot(obj, features = c("CD14", "VIM", "VWF", "ACTA2", "DLK1", "DAZL", "MAGEA4", "UTF1", "ID4", "FGFR3", "KIT", "DMRT1", "DMRTB1", "STRA8", "SYCP3", "SPO11", "MLH3", "ZPBP", "TNP1", "PRM2"), ncol=5, order=TRUE) & NoAxes() & NoLegend()

new.cluster.ids <- c("Sperm", "Early primary S'cytes", "Leydig cells", "Round S'tids", "Differentiating S'gonia", "Elongated S'tids", "Endothelial cells", "SSCs", "Myoid cells", "Sertoli cells", "Macrophages", "Later primary S'cytes", "Sertoli-like")
names(new.cluster.ids) <- levels(obj)
obj <- RenameIdents(obj, new.cluster.ids)
DimPlot(obj, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend()

Cell development trajetory construction

We now select germline cells from the dataset and construct the cell development trajectory using a principal curve. Although there are many well-documented methods for constructing linear trajectories, here we use a straightforward and perhaps the simplest approach purely for demonstration purposes.

sel <- DimSelector(obj)
obj.sel <- obj[,sel]

require(princurve)
Loading required package: princurve
emb <- Embeddings(obj.sel, 'umap')
emb0 <- as.data.frame(emb)
emb0$idents <- Idents(obj)[rownames(emb0)]
emb0 %>% group_by(idents) %>% summarize(x=median(umap_1),y=median(umap_2)) -> emb1
emb2 <- as.matrix(emb1[,c("x","y")])
rownames(emb2) <- emb1$idents
emb2 <- emb2[c("SSCs", "Differentiating S'gonia", "Early primary S'cytes", "Later primary S'cytes", "Round S'tids", "Elongated S'tids", "Sperm"),]
pc <- princurve::principal_curve(x = emb, start = emb2)
s <- as.data.frame(pc$s)
colnames(s) <- c("x", "y")
emb0 <- cbind(emb0, s)
emb0 <- emb0[pc$ord, ]

ggplot(emb0)+ geom_point(aes(umap_1, umap_2, color = idents)) + geom_path(aes(x,y), color = "blue", linewidth=1)+ geom_point(data = emb2, aes(x, y), size=2) 

Perform alternative splicing analysis along the cell trajetory

cells <- colnames(obj.sel)

exon1 <- ReadPISA("exon1/", prefix = "d1-rep1-", cells = cells)
exon2 <- ReadPISA("exon2/", prefix = "d1-rep2-", cells = cells)
exon <- mergeMatrix(exon1, exon2)

junction1 <- ReadPISA("junction1/", prefix = "d1-rep1-", cells = cells)
junction2 <- ReadPISA("junction2/", prefix = "d1-rep2-", cells = cells)
junction <- mergeMatrix(junction1, junction2)

obj.sel[['exon']] <- CreateAssayObject(exon, min.cells = 100)
obj.sel[['junction']] <- CreateAssayObject(junction, min.cells = 100)
rm(exon, exon1,exon2, junction, junction1, junction2)

order.cells <- rownames(emb0)

DefaultAssay(obj.sel) <- "exon"
obj.sel <- NormalizeData(obj.sel)
obj.sel <- ParseExonName(obj.sel)

When setup order cells, the weight matrix name will be named to “trajectory_wm”.

obj.sel <- RunAutoCorr(obj.sel, order.cells = order.cells, wm.name = "trajectory_wm")
Working on assay : exon
Run autocorrelation test for 72346 features.
Runtime : 7.999062 secs
69085 autocorrelated features.
obj.sel <- RunSDT(obj.sel, wm.name = "trajectory_wm", bind.name = "gene_name", bind.assay = "RNA")
Working on assay exon.
Working on binding assay RNA.
Use predefined weight matrix "trajectory_wm".
Processing 69085 features.
Processing 13893 binding features.
Retrieve binding data from assay RNA.
Use "data" layer for test features and binding features.
Using 95 threads.
Runtime : 27.36017 secs.
DefaultAssay(obj.sel) <- "junction"
obj.sel <- NormalizeData(obj.sel)
obj.sel <- ParseExonName(obj.sel)
Working on assay junction
obj.sel <- RunAutoCorr(obj.sel, order.cells = order.cells, wm.name = "trajectory_wm")
Working on assay : junction
Run autocorrelation test for 7583 features.
Runtime : 0.7428648 secs
7350 autocorrelated features.
obj.sel <- RunSDT(obj.sel, wm.name = "trajectory_wm", bind.name = "gene_name", bind.assay = "RNA")
Working on assay junction.
Working on binding assay RNA.
Use predefined weight matrix "trajectory_wm".
Processing 7350 features.
Processing 3724 binding features.
Retrieve binding data from assay RNA.
Use "data" layer for test features and binding features.
Using 95 threads.
Runtime : 3.308762 secs.
FbtPlot(obj.sel, val = "gene_name.padj", assay = c("exon", "junction"), shape.by = "assay", color.by = "assay")

Meta(obj.sel, assay = "exon") %>% filter(gene_name.padj <1e-20) %>% DT::datatable()
Meta(obj.sel, assay = "junction") %>% filter(gene_name.padj <1e-20) %>% DT::datatable()
FeaturePlot(obj.sel, features = c("3:52413996-52414048/+/PHF7", "3:52414048-52414496/+/PHF7", "PHF7"), ncol=3, order=TRUE)
Warning: Could not find 3:52413996-52414048/+/PHF7 in the default search
locations, found in 'exon' assay instead
Warning: Could not find PHF7 in the default search locations, found in 'RNA'
assay instead

Perform alternative splicing on a supervised two dimension space

We can also perform the analysis using a supervised two-dimensional cell space. As an example, we utilize Seurat’s built-in cell cycle scoring scheme for demonstration purposes.

DefaultAssay(obj.sel) <- "RNA"

s.genes <- cc.genes$s.genes
g2m.genes <- cc.genes$g2m.genes

obj.sel[['Group']] <- Idents(obj.sel)
obj.sel <- CellCycleScoring(obj.sel, s.features = s.genes, g2m.features = g2m.genes, set.ident = TRUE)
Warning: The following features are not present in the object: MLF1IP, E2F8,
not searching for symbol synonyms
Warning: The following features are not present in the object: FAM64A, AURKB,
HN1, not searching for symbol synonyms
df <- obj.sel[[]]

p1 <- ggplot(df) + geom_point(aes(x=S.Score, y = G2M.Score, color = Phase))
p2 <- ggplot(df) + geom_point(aes(x=S.Score, y = G2M.Score, color = Group))
p1 + p2

# Create cell cycle coordinate space with the module scores
ccc <- as.matrix(df[, c("S.Score", "G2M.Score")])
head(ccc)
                              S.Score   G2M.Score
d1-rep1-GACAGAGTCATACGGT-1 0.37908301  0.27462393
d1-rep1-GCTCCTATCATCGCTC-1 0.40969930  0.24794880
d1-rep1-GTAGTCATCCGAACGC-1 0.06002289  0.09029104
d1-rep1-TTTCCTCCACTATCTT-1 0.37392161  0.02509706
d1-rep1-TACGGTAGTAAGGGCT-1 0.51073399  0.28205031
d1-rep1-AGGGTGAAGATATGCA-1 0.24597679 -0.02650419
colnames(ccc) <- c("CCC_1", "CCC_2")

obj.sel[['ccc']] <- CreateDimReducObject(embeddings = ccc, assay = "RNA")

DefaultAssay(obj.sel) <- "exon"
obj.sel <- RunAutoCorr(obj.sel, reduction = "ccc", wm.name = "ccc_wm", dims=1:2)
Working on assay : exon
Run autocorrelation test for 72346 features.
Runtime : 6.863887 secs
60820 autocorrelated features.
obj.sel <- RunSDT(obj.sel, wm.name = "ccc_wm", bind.name = "gene_name", bind.assay = "RNA", prefix = "ccc")
Working on assay exon.
Working on binding assay RNA.
Use predefined weight matrix "ccc_wm".
Processing 60820 features.
Processing 12689 binding features.
Retrieve binding data from assay RNA.
Use "data" layer for test features and binding features.
Using 95 threads.
Runtime : 24.36367 secs.
DefaultAssay(obj.sel) <- "junction"
obj.sel <- RunAutoCorr(obj.sel, reduction = "ccc", wm.name = "ccc_wm", dims=1:2)
Working on assay : junction
Run autocorrelation test for 7583 features.
Runtime : 0.9042523 secs
6884 autocorrelated features.
obj.sel <- RunSDT(obj.sel, wm.name = "ccc_wm", bind.name = "gene_name", bind.assay = "RNA", prefix = "ccc")
Working on assay junction.
Working on binding assay RNA.
Use predefined weight matrix "ccc_wm".
Processing 6884 features.
Processing 3549 binding features.
Retrieve binding data from assay RNA.
Use "data" layer for test features and binding features.
Using 95 threads.
Runtime : 3.6925 secs.
FbtPlot(obj.sel, val = "ccc.padj", assay = c("exon", "junction"), shape.by = "assay", color.by = "assay")

Meta(obj.sel, assay = "exon") %>% filter(ccc.padj<1e-20) %>% DT::datatable()
FeaturePlot(obj.sel, reduction =  "ccc", features = c("7:74196642-74197050/+/EIF4H", "EIF4H"), order = TRUE)
Warning: Could not find 7:74196642-74197050/+/EIF4H in the default search
locations, found in 'exon' assay instead
Warning: Could not find EIF4H in the default search locations, found in 'RNA'
assay instead

gtf <- gtf2db("./Homo_sapiens.GRCh38.114.chr.gtf.gz")
[2026-06-15 14:46:14] GTF loading..
[2026-06-15 14:46:51] Load 78686 genes.
[2026-06-15 14:46:51] Load time : 37.057 sec
bamfiles <- list(rep1 = "./donor1_rep1.bam", rep2 = "./donor1_rep2.bam")
cell.list <- split(obj.sel$Phase, obj.sel$replicate)

cell.list1 <- lapply(cell.list, function(x) {
  nm <- names(x)
  names(x) <- gsub(".*rep[12]-(.*)","\\1",nm)
  x
})
str(cell.list1)
List of 2
 $ rep1: Named chr [1:954] "S" "S" "G2M" "S" ...
  ..- attr(*, "names")= chr [1:954] "GACAGAGTCATACGGT-1" "GCTCCTATCATCGCTC-1" "GTAGTCATCCGAACGC-1" "TTTCCTCCACTATCTT-1" ...
 $ rep2: Named chr [1:2023] "S" "S" "S" "S" ...
  ..- attr(*, "names")= chr [1:2023] "GGCTCGACATGTCTCC-1" "CCTATTAGTCCAGTAT-1" "GTAACGTCACTTACGA-1" "AAGGCAGCACGTCTCT-1" ...
TrackPlot(bamfile = bamfiles, cell.group = cell.list1, gtf = gtf, gene = "EIF4H", highlights = c(74196642,74197050), junc=TRUE)

Command(obj.sel)
 [1] "NormalizeData.RNA"            "FindVariableFeatures.RNA"    
 [3] "ScaleData.RNA"                "RunPCA.RNA"                  
 [5] "FindNeighbors.RNA.pca"        "FindClusters"                
 [7] "RunUMAP.RNA.pca"              "NormalizeData.exon"          
 [9] "ParseExonName.exon"           "RunAutoCorr.exon.pca"        
[11] "NormalizeData.junction"       "ParseExonName.junction"      
[13] "RunAutoCorr.junction.pca"     "RunAutoCorr.exon.ccc"        
[15] "SetAutoCorrFeatures.exon"     "RunSDT.exon"                 
[17] "RunAutoCorr.junction.ccc"     "SetAutoCorrFeatures.junction"
[19] "RunSDT.junction"             
sessionInfo()
R version 4.5.3 (2026-03-11)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] future_1.70.0      princurve_2.1.6    Yano_1.5           dplyr_1.2.0       
[5] ggplot2_4.0.2      Seurat_5.4.0       SeuratObject_5.3.0 sp_2.2-1          

loaded via a namespace (and not attached):
  [1] deldir_2.0-4           pbapply_1.7-4          gridExtra_2.3         
  [4] rlang_1.1.7            magrittr_2.0.4         RcppAnnoy_0.0.23      
  [7] otel_0.2.0             spatstat.geom_3.7-3    matrixStats_1.5.0     
 [10] ggridges_0.5.7         compiler_4.5.3         systemfonts_1.3.2     
 [13] png_0.1-9              vctrs_0.7.2            reshape2_1.4.5        
 [16] stringr_1.6.0          pkgconfig_2.0.3        fastmap_1.2.0         
 [19] labeling_0.4.3         promises_1.5.0         rmarkdown_2.30        
 [22] purrr_1.2.1            xfun_0.57              cachem_1.1.0          
 [25] jsonlite_2.0.0         goftest_1.2-3          later_1.4.8           
 [28] spatstat.utils_3.2-2   irlba_2.3.7            parallel_4.5.3        
 [31] cluster_2.1.8.2        R6_2.6.1               ica_1.0-3             
 [34] bslib_0.10.0           stringi_1.8.7          RColorBrewer_1.1-3    
 [37] spatstat.data_3.1-9    reticulate_1.45.0      parallelly_1.46.1     
 [40] spatstat.univar_3.1-7  jquerylib_0.1.4        lmtest_0.9-40         
 [43] scattermore_1.2        Rcpp_1.1.1             knitr_1.51            
 [46] tensor_1.5.1           future.apply_1.20.2    zoo_1.8-15            
 [49] sctransform_0.4.3      httpuv_1.6.17          Matrix_1.7-5          
 [52] splines_4.5.3          igraph_2.2.2           tidyselect_1.2.1      
 [55] viridis_0.6.5          abind_1.4-8            yaml_2.3.12           
 [58] spatstat.random_3.4-5  codetools_0.2-20       miniUI_0.1.2          
 [61] spatstat.explore_3.8-0 listenv_0.10.1         lattice_0.22-9        
 [64] tibble_3.3.1           plyr_1.8.9             withr_3.0.2           
 [67] shiny_1.13.0           S7_0.2.1               ROCR_1.0-12           
 [70] evaluate_1.0.5         Rtsne_0.17             fastDummies_1.7.5     
 [73] survival_3.8-6         polyclip_1.10-7        fitdistrplus_1.2-6    
 [76] pillar_1.11.1          KernSmooth_2.23-26     DT_0.34.0             
 [79] plotly_4.12.0          generics_0.1.4         RcppHNSW_0.6.0        
 [82] scales_1.4.0           gtools_3.9.5           globals_0.19.1        
 [85] xtable_1.8-8           glue_1.8.0             lazyeval_0.2.2        
 [88] tools_4.5.3            data.table_1.18.2.1    RSpectra_0.16-2       
 [91] RANN_2.6.2             dotCall64_1.2          cowplot_1.2.0         
 [94] grid_4.5.3             tidyr_1.3.2            crosstalk_1.2.2       
 [97] nlme_3.1-168           patchwork_1.3.2        cli_3.6.5             
[100] spatstat.sparse_3.1-0  spam_2.11-3            viridisLite_0.4.3     
[103] uwot_0.2.4             gtable_0.3.6           sass_0.4.10           
[106] digest_0.6.39          progressr_0.18.0       ggrepel_0.9.8         
[109] htmlwidgets_1.6.4      farver_2.1.2           htmltools_0.5.9       
[112] lifecycle_1.0.5        httr_1.4.8             mime_0.13             
[115] MASS_7.3-65           
Back to top