Spatial dissimilarity analysis for single-cell trajectories and supervised embeddings

In the previous vignette, we performed cell clustering using unsupervised methods. Typically, a PCA or integrated space (e.g., Harmony) is generated during this process, and we then calculate the cell weight matrix directly from the cell embedding space. This approach is convenient for running the pipeline, and provide the general result in all cell population. Furthermore, for specific signaling or biological pathways, it is often more informative to focus on a ‘supervised’ cell ordering. This vignette demonstrates how to perform spatial dissimilarity test in low-dimensional spaces, such as cell trajectory.

Prepare raw data

We used the human testis single-cell atlas dataset for this analysis. The full dataset is publicly available at GSE112013. For this vignette, we only used replicate 1.

## We first download the BAM files and GTF file
wget -c https://sra-pub-src-1.s3.amazonaws.com/SRR6860519/14515X1.bam.1 -O donor1_rep1.bam
wget -c https://sra-pub-src-1.s3.amazonaws.com/SRR6860520/14606X1.bam.1 -O donor1_rep2.bam

## Please Note: we use a different annotation file from previos vignette, because the BAM file use NCBI style chromosome name here.
wget https://ftp.ensembl.org/pub/release-114/gtf/homo_sapiens/Homo_sapiens.GRCh38.114.chr.gtf.gz

## Index the bam files for track plots
samtools index donor1_rep1.bam
samtools index donor1_rep2.bam

## Then we use PISA to generate feature counts
PISA anno -gtf Homo_sapiens.GRCh38.114.chr.gtf.gz -exon -psi -o d1_rep2_anno.bam donor1_rep2.bam
PISA anno -gtf Homo_sapiens.GRCh38.114.chr.gtf.gz -exon -psi -o d1_rep1_anno.bam donor1_rep1.bam

mkdir exp1
mkdir exp2
mkdir exon1
mkdir exon2
mkdir junction1
mkdir junction2
mkdir exclude1
mkdir exclude2

PISA count -tag CB -umi UB -anno-tag GN -outdir exp1 d1_rep1_anno.bam
PISA count -tag CB -umi UB -anno-tag GN -outdir exp2 d1_rep2_anno.bam
PISA count -tag CB -umi UB -anno-tag EX -outdir exon1 d1_rep1_anno.bam
PISA count -tag CB -umi UB -anno-tag EX -outdir exon2 d1_rep2_anno.bam
PISA count -tag CB -umi UB -anno-tag JC -outdir junction1 d1_rep1_anno.bam
PISA count -tag CB -umi UB -anno-tag JC -outdir junction2 d1_rep2_anno.bam

# Note: Although we generate the excluded reads here, we do not use them in this demonstration for the sake of simplicity. However, it is recommended to test them in real cases, so I’ve kept the data available for you to use in your own analyses.
PISA count -tag CB -umi UB -anno-tag ER -outdir exclude1 d1_rep1_anno.bam
PISA count -tag CB -umi UB -anno-tag ER -outdir exclude2 d1_rep2_anno.bam

Cell clustering and annotation

require(Yano)

exp1 <- ReadPISA("exp1/", prefix = "d1-rep1-")
exp2 <- ReadPISA("exp2/", prefix = "d1-rep2-")

exp <- mergeMatrix(exp1, exp2)

obj <- QuickRecipe(exp, min.features = 800, min.cells = 100, resolution=0.2)

Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')

Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 4400
Number of edges: 134553

Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9639
Number of communities: 13
Elapsed time: 0 seconds

Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session

rm(exp1, exp2, exp)

obj$replicate <- gsub("d1-(.*)-.*-1", "\\1", colnames(obj))
DimPlot(obj, label = TRUE, label.size = 5)

FeaturePlot(obj, features = c("CD14", "VIM", "VWF", "ACTA2", "DLK1", "DAZL", "MAGEA4", "UTF1", "ID4", "FGFR3", "KIT", "DMRT1", "DMRTB1", "STRA8", "SYCP3", "SPO11", "MLH3", "ZPBP", "TNP1", "PRM2"), ncol=5, order=TRUE) & NoAxes() & NoLegend()

Warning: The `slot` argument of `FetchData()` is deprecated as of SeuratObject 5.0.0.
ℹ Please use the `layer` argument instead.
ℹ The deprecated feature was likely used in the Seurat package.
  Please report the issue at <https://github.com/satijalab/seurat/issues>.

new.cluster.ids <- c("Sperm", "Early primary S'cytes", "Leydig cells", "Round S'tids", "Differentiating S'gonia", "Elongated S'tids", "Endothelial cells", "SSCs", "Myoid cells", "Sertoli cells", "Macrophages", "Later primary S'cytes", "Sertoli-like")
names(new.cluster.ids) <- levels(obj)
obj <- RenameIdents(obj, new.cluster.ids)
DimPlot(obj, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend()

Cell development trajetory construction

We now select germline cells from the dataset and construct the cell development trajectory using a principal curve. Although there are many well-documented methods for constructing linear trajectories, here we use a straightforward and perhaps the simplest approach purely for demonstration purposes.

sel <- DimSelector(obj)
obj.sel <- obj[,sel]

require(princurve)

Loading required package: princurve

emb <- Embeddings(obj.sel, 'umap')
emb0 <- as.data.frame(emb)
emb0$idents <- Idents(obj)[rownames(emb0)]
emb0 %>% group_by(idents) %>% summarize(x=median(umap_1),y=median(umap_2)) -> emb1
emb2 <- as.matrix(emb1[,c("x","y")])
rownames(emb2) <- emb1$idents
emb2 <- emb2[c("SSCs", "Differentiating S'gonia", "Early primary S'cytes", "Later primary S'cytes", "Round S'tids", "Elongated S'tids", "Sperm"),]
pc <- princurve::principal_curve(x = emb, start = emb2)
s <- as.data.frame(pc$s)
colnames(s) <- c("x", "y")
emb0 <- cbind(emb0, s)
emb0 <- emb0[pc$ord, ]

ggplot(emb0)+ geom_point(aes(umap_1, umap_2, color = idents)) + geom_path(aes(x,y), color = "blue", linewidth=1)+ geom_point(data = emb2, aes(x, y), size=2)

Perform alternative splicing analysis along the cell trajetory

cells <- colnames(obj.sel)

exon1 <- ReadPISA("exon1/", prefix = "d1-rep1-", cells = cells)
exon2 <- ReadPISA("exon2/", prefix = "d1-rep2-", cells = cells)
exon <- mergeMatrix(exon1, exon2)

junction1 <- ReadPISA("junction1/", prefix = "d1-rep1-", cells = cells)
junction2 <- ReadPISA("junction2/", prefix = "d1-rep2-", cells = cells)
junction <- mergeMatrix(junction1, junction2)

obj.sel[['exon']] <- CreateAssayObject(exon, min.cells = 100)
obj.sel[['junction']] <- CreateAssayObject(junction, min.cells = 100)
rm(exon, exon1,exon2, junction, junction1, junction2)

order.cells <- rownames(emb0)

DefaultAssay(obj.sel) <- "exon"
obj.sel <- NormalizeData(obj.sel)

Warning: The `slot` argument of `SetAssayData()` is deprecated as of SeuratObject 5.0.0.
ℹ Please use the `layer` argument instead.
ℹ The deprecated feature was likely used in the Seurat package.
  Please report the issue at <https://github.com/satijalab/seurat/issues>.

Warning: The `slot` argument of `GetAssayData()` is deprecated as of SeuratObject 5.0.0.
ℹ Please use the `layer` argument instead.
ℹ The deprecated feature was likely used in the Seurat package.
  Please report the issue at <https://github.com/satijalab/seurat/issues>.

obj.sel <- ParseExonName(obj.sel)

When setup order cells, the weight matrix name will be named to “trajectory_wm”.

obj.sel <- RunAutoCorr(obj.sel, order.cells = order.cells, wm.name = "trajectory_wm")

Working on assay : exon

Run autocorrelation test for 72346 features.

Runtime : 9.748158 secs

58448 autocorrelated features.

obj.sel <- RunSDT(obj.sel, wm.name = "trajectory_wm", bind.name = "gene_name", bind.assay = "RNA")

Working on assay exon.

Working on binding assay RNA.

Use predefined weight matrix "trajectory_wm".

Processing 58448 features.

Processing 12417 binding features.

Retrieve binding data from assay RNA.

Use "data" layer for test features and binding features.

Using 7 threads.

Runtime : 6.944135 mins.

DefaultAssay(obj.sel) <- "junction"
obj.sel <- NormalizeData(obj.sel)
obj.sel <- ParseExonName(obj.sel)

Working on assay junction

obj.sel <- RunAutoCorr(obj.sel, order.cells = order.cells, wm.name = "trajectory_wm")

Working on assay : junction

Run autocorrelation test for 7583 features.

Runtime : 1.508061 secs

6466 autocorrelated features.

obj.sel <- RunSDT(obj.sel, wm.name = "trajectory_wm", bind.name = "gene_name", bind.assay = "RNA")

Working on assay junction.

Working on binding assay RNA.

Use predefined weight matrix "trajectory_wm".

Processing 6466 features.

Processing 3403 binding features.

Retrieve binding data from assay RNA.

Use "data" layer for test features and binding features.

Using 7 threads.

Runtime : 46.74309 secs.

FbtPlot(obj.sel, val = "gene_name.padj", assay = c("exon", "junction"), shape.by = "assay", color.by = "assay")

Meta(obj.sel, assay = "exon") %>% filter(gene_name.padj <1e-8) %>% knitr::kable()

	chr	start	end	gene_name	strand	moransi.pval	moransi	autocorr.variable	gene_name.D	gene_name.t
1:211398036-211399150/+/LINC00467	1	211398036	211399150	LINC00467	+	0.0000000	0.0816324	TRUE	0.6269152	7.875978
1:211398036-211399220/+/LINC00467	1	211398036	211399220	LINC00467	+	0.0000000	0.0797745	TRUE	0.6330680	8.161452
1:211398036-211399145/+/LINC00467	1	211398036	211399145	LINC00467	+	0.0000000	0.0804612	TRUE	0.6270202	7.833554
10:84371014-84372469/+/CCSER2	10	84371014	84372469	CCSER2	+	0.0000466	0.0459559	TRUE	0.6213102	8.800596
3:52414496-52414844/+/PHF7	3	52414496	52414844	PHF7	+	0.0000000	0.1310723	TRUE	0.6558172	7.956239
7:4984910-4985075/+/RBAK-RBAKDN	7	4984910	4985075	RBAK-RBAKDN	+	0.0000000	0.1330786	TRUE	0.6593649	8.421093
7:74196642-74197050/+/EIF4H	7	74196642	74197050	EIF4H	+	0.0000000	0.1397753	TRUE	0.7475285	14.535688
7:96645724-96645936/-/SEM1	7	96645724	96645936	SEM1	-	0.0000145	0.0494136	TRUE	0.6470772	10.301505
7:96645390-96645932/-/SEM1	7	96645390	96645932	SEM1	-	0.0000000	0.0720516	TRUE	0.6593465	10.585368
7:96645391-96645932/-/SEM1	7	96645391	96645932	SEM1	-	0.0000000	0.0748368	TRUE	0.6561704	10.424636
7:96645588-96645936/-/SEM1	7	96645588	96645936	SEM1	-	0.0000000	0.0796474	TRUE	0.6639002	10.867209
7:96645724-96645932/-/SEM1	7	96645724	96645932	SEM1	-	0.0002720	0.0407584	TRUE	0.6266415	9.058837
7:127591409-127591951/+/FSCN3	7	127591409	127591951	FSCN3	+	0.0000000	0.2950965	TRUE	0.8446530	13.659938
9:128500067-128501292/+/ODF2	9	128500067	128501292	ODF2	+	0.0023921	0.0331474	TRUE	0.6216168	8.123203
9:128500067-128501260/+/ODF2	9	128500067	128501260	ODF2	+	0.0023921	0.0331474	TRUE	0.6216168	8.123203
9:128500067-128500960/+/ODF2	9	128500067	128500960	ODF2	+	0.0020902	0.0336515	TRUE	0.6187131	7.971053

Meta(obj.sel, assay = "junction") %>% filter(gene_name.padj <1e-8) %>% knitr::kable()

	chr	start	end	gene_name	strand	moransi.pval	moransi	autocorr.variable	gene_name.D	gene_name.t	gene_name.pval	gene_name.padj
3:52414048-52414496/+/PHF7	3	52414048	52414496	PHF7	+	0	0.133177	TRUE	0.6908519	8.943853	0	0

FeaturePlot(obj.sel, features = c("3:52413996-52414048/+/PHF7", "3:52414048-52414496/+/PHF7", "PHF7"), ncol=3, order=TRUE)

Warning: Could not find 3:52413996-52414048/+/PHF7 in the default search
locations, found in 'exon' assay instead

Warning: Could not find PHF7 in the default search locations, found in 'RNA'
assay instead

Perform alternative splicing analysis for all cell space

To use all cell space, we do not need to specify a name for the weight matrix here, because it will default to using the PCA reduction map to generate the weight matrix. The resulting matrix will be named “reduction_wm”.

DefaultAssay(obj.sel) <- "exon"
obj.sel <- RunAutoCorr(obj.sel)

Working on assay : exon

Run autocorrelation test for 72346 features.

Runtime : 20.01246 secs

70882 autocorrelated features.

Now we have two weight matrices. If no name is specified in RunSDT(), the first matrix will always be used by default, so we need to explicitly specify “pca_wm” to avoid misusing the weight matrix here.

grep("_wm", names(obj.sel), value=TRUE)

[1] "trajectory_wm" "pca_wm"

obj.sel <- RunSDT(obj.sel, bind.name = "gene_name", bind.assay = "RNA", wm.name = "pca_wm", prefix="all")

Working on assay exon.

Working on binding assay RNA.

Use predefined weight matrix "pca_wm".

Processing 70882 features.

Processing 14088 binding features.

Retrieve binding data from assay RNA.

Use "data" layer for test features and binding features.

Using 7 threads.

Runtime : 11.06989 mins.

DefaultAssay(obj.sel) <- "junction"
obj.sel <- RunAutoCorr(obj.sel)

Working on assay : junction

Run autocorrelation test for 7583 features.

Runtime : 2.407756 secs

7545 autocorrelated features.

obj.sel <- RunSDT(obj.sel, bind.name = "gene_name", bind.assay = "RNA", wm.name = "pca_wm", prefix="all")

Working on assay junction.

Working on binding assay RNA.

Use predefined weight matrix "pca_wm".

Processing 7545 features.

Processing 3800 binding features.

Retrieve binding data from assay RNA.

Use "data" layer for test features and binding features.

Using 7 threads.

Runtime : 1.231873 mins.

By comparing the use of cell trajectory ordering with the PCA space, we found that using the PCA approach can detect more alternatively expressed genes. This is because forcing all cells into a 1D trajectory ordering preserves only the rank of each cell, but loses much of the neighborhood information about cells in the broader context. Since cell trajectories are typically constructed based on cell–cell distances in high-dimensional space, regardless of the algorithm used to build the trajectory, I personally recommend using the cell space directly rather than relying solely on 1D cell ordering, unless you have specific reasons or detailed knowledge.

FbtPlot(obj.sel, val = "all.padj", assay = c("exon", "junction"), shape.by = "assay", color.by = "assay")

Meta(obj.sel, assay = "exon") %>% filter(all.padj <1e-20) %>% knitr::kable()

	chr	start	end	gene_name	strand	moransi	autocorr.variable	gene_name.D	gene_name.t	gene_name.pval	gene_name.padj	all.D	all.t
1:25901491-25901682/-/STMN1	1	25901491	25901682	STMN1	-	0.5365729	TRUE	0.5138080	-0.6083112	0.7278129	1.0000000	0.5050367	15.15135
1:25901407-25901682/-/STMN1	1	25901407	25901682	STMN1	-	0.4763702	TRUE	0.4974101	-1.2050008	0.8844622	1.0000000	0.4729623	14.75713
1:244854973-244855044/-/HNRNPU	1	244854973	244855044	HNRNPU	-	0.1066201	TRUE	0.4263039	-5.3421718	0.9999997	1.0000000	0.4608809	15.76613
1:244855424-244855608/-/HNRNPU	1	244855424	244855608	HNRNPU	-	0.1499542	TRUE	0.4614148	-2.8587774	0.9974071	1.0000000	0.5291103	17.48624
1:244855424-244855597/-/HNRNPU	1	244855424	244855597	HNRNPU	-	0.1448451	TRUE	0.4657325	-2.5138832	0.9932233	1.0000000	0.5180136	17.42260
1:244855424-244856096/-/HNRNPU	1	244855424	244856096	HNRNPU	-	0.1290502	TRUE	0.4635919	-2.2568029	0.9868908	1.0000000	0.4924957	15.99049
10:35179134-35179769/+/CREM	10	35179134	35179769	CREM	+	0.2795176	TRUE	0.6571513	6.5368376	0.0000000	0.0000038	0.6140893	20.52634
10:35179134-35179847/+/CREM	10	35179134	35179847	CREM	+	0.2965156	TRUE	0.6671544	6.6315722	0.0000000	0.0000026	0.6322654	20.79021
10:35179134-35179843/+/CREM	10	35179134	35179843	CREM	+	0.2963525	TRUE	0.6664554	6.6308385	0.0000000	0.0000026	0.6310836	20.75944
10:84371014-84372469/+/CCSER2	10	84371014	84372469	CCSER2	+	0.0694474	TRUE	0.6213102	8.8005958	0.0000000	0.0000000	0.4298537	14.91362
13:27622875-27623449/+/POLR1D	13	27622875	27623449	POLR1D	+	0.1353060	TRUE	0.5669509	3.8114291	0.0001201	0.1754453	0.5305367	17.51706
13:27622841-27623452/+/POLR1D	13	27622841	27623452	POLR1D	+	0.1352448	TRUE	0.5677941	3.8753581	0.0000957	0.1434632	0.5308687	17.44830
15:80131180-80131293/+/ZFAND6	15	80131180	80131293	ZFAND6	+	0.1415461	TRUE	0.5294575	2.4872262	0.0072731	1.0000000	0.4809282	15.51713
15:80129600-80131293/+/ZFAND6	15	80129600	80131293	ZFAND6	+	0.1130790	TRUE	0.5055989	1.6040236	0.0559472	1.0000000	0.4281787	16.85202
15:80137480-80137678/+/ZFAND6	15	80137480	80137678	ZFAND6	+	0.1506118	TRUE	0.5313132	2.8853403	0.0023996	1.0000000	0.4767508	15.08482
15:80137480-80137811/+/ZFAND6	15	80137480	80137811	ZFAND6	+	0.1501969	TRUE	0.5089926	1.1308509	0.1304250	1.0000000	0.4761354	14.67508
16:1883196-1883296/-/MEIOB	16	1883196	1883296	MEIOB	-	0.1565362	TRUE	0.5540380	4.4186253	0.0000127	0.0218663	0.4252987	14.58586
16:1884097-1884294/-/MEIOB	16	1884097	1884294	MEIOB	-	0.1760430	TRUE	0.5573449	4.7002546	0.0000042	0.0074443	0.4450364	16.37583
17:1345258-1345499/-/YWHAE	17	1345258	1345499	YWHAE	-	0.0589847	TRUE	0.6035363	7.2019361	0.0000000	0.0000002	0.4505080	14.75010
17:75778810-75778963/-/H3-3B	17	75778810	75778963	H3-3B	-	0.6555098	TRUE	0.4775407	-2.4017911	0.9909098	1.0000000	0.4657620	14.05098
17:75778810-75779184/-/H3-3B	17	75778810	75779184	H3-3B	-	0.6255431	TRUE	0.4735932	-2.5603642	0.9940165	1.0000000	0.4537018	13.30349
17:82615718-82616025/-/WDR45B	17	82615718	82616025	WDR45B	-	0.1523974	TRUE	0.4596537	-1.4446807	0.9241474	1.0000000	0.4633808	13.58817
18:49482152-49482410/-/RPL17-C18orf32	18	49482152	49482410	RPL17-C18orf32	-	0.1626891	TRUE	0.5976995	2.9204659	0.0021642	1.0000000	0.5137507	16.14814
18:49482176-49482410/-/RPL17-C18orf32	18	49482176	49482410	RPL17-C18orf32	-	0.1629240	TRUE	0.5960931	2.8858653	0.0023959	1.0000000	0.5137443	16.25708
18:49483584-49483771/-/RPL17-C18orf32	18	49483584	49483771	RPL17-C18orf32	-	0.2033912	TRUE	0.5729379	2.4115739	0.0088634	1.0000000	0.4885709	14.70143
2:61141592-61143945/-/C2orf74-AS1	2	61141592	61143945	C2orf74-AS1	-	0.1129809	TRUE	0.6125111	6.8532884	0.0000000	0.0000010	0.4952707	15.55161
20:35774488-35775058/+/PHF20	20	35774488	35775058	PHF20	+	0.0693745	TRUE	0.5246998	3.7053294	0.0001739	0.2479685	0.4130324	15.00582
3:45007420-45007575/+/EXOSC7	3	45007420	45007575	EXOSC7	+	0.1241484	TRUE	0.4985161	1.2424687	0.1084993	1.0000000	0.4331207	14.18691
3:45011235-45011393/+/EXOSC7	3	45011235	45011393	EXOSC7	+	0.1563919	TRUE	0.5202833	1.9727128	0.0256586	1.0000000	0.4716648	14.73913
3:45011235-45011471/+/EXOSC7	3	45011235	45011471	EXOSC7	+	0.1772089	TRUE	0.5317377	2.4737189	0.0075368	1.0000000	0.5021152	16.29195
3:45011235-45011470/+/EXOSC7	3	45011235	45011470	EXOSC7	+	0.1777901	TRUE	0.5333368	2.5716212	0.0058044	1.0000000	0.5041374	16.46894
3:52413996-52414048/+/PHF7	3	52413996	52414048	PHF7	+	0.1410946	TRUE	0.6335731	6.2777400	0.0000000	0.0000122	0.4988230	14.70155
3:52414496-52414844/+/PHF7	3	52414496	52414844	PHF7	+	0.1455321	TRUE	0.6558172	7.9562386	0.0000000	0.0000000	0.5357904	17.07624
5:28810519-28810897/+/LINC02109	5	28810519	28810897	LINC02109	+	0.1095984	TRUE	0.5522362	4.8399018	0.0000024	0.0045116	0.4643932	14.33751
7:2234198-2235564/-/ENSG00000286192	7	2234198	2235564	ENSG00000286192	-	0.1514527	TRUE	0.5234723	2.4383464	0.0082683	1.0000000	0.4600698	15.10529
7:4984910-4985075/+/RBAK-RBAKDN	7	4984910	4985075	RBAK-RBAKDN	+	0.1729539	TRUE	0.6593649	8.4210932	0.0000000	0.0000000	0.5476979	19.67029
7:5779985-5780292/-/RNF216	7	5779985	5780292	RNF216	-	0.0640424	TRUE	0.5246596	3.5873547	0.0002606	0.3384448	0.4067393	14.01626
7:74196642-74197050/+/EIF4H	7	74196642	74197050	EIF4H	+	0.1499599	TRUE	0.7475285	14.5356884	0.0000000	0.0000000	0.6529201	26.76167
7:92244002-92244149/-/ENSG00000285953	7	92244002	92244149	ENSG00000285953	-	0.2155620	TRUE	0.5857502	4.7302274	0.0000037	0.0068079	0.5361302	16.49559
7:92244902-92245134/-/ENSG00000285953	7	92244902	92245134	ENSG00000285953	-	0.1772488	TRUE	0.5634043	3.8898895	0.0000909	0.1398010	0.4787132	13.47767
7:96645724-96645936/-/SEM1	7	96645724	96645936	SEM1	-	0.0720968	TRUE	0.6470772	10.3015054	0.0000000	0.0000000	0.5103109	17.74491
7:96645390-96645932/-/SEM1	7	96645390	96645932	SEM1	-	0.0827188	TRUE	0.6593465	10.5853680	0.0000000	0.0000000	0.5265752	18.97728
7:96645391-96645932/-/SEM1	7	96645391	96645932	SEM1	-	0.0817727	TRUE	0.6561704	10.4246355	0.0000000	0.0000000	0.5217723	18.72397
7:96645588-96645936/-/SEM1	7	96645588	96645936	SEM1	-	0.0857761	TRUE	0.6639002	10.8672094	0.0000000	0.0000000	0.5333738	19.05890
7:96645724-96645932/-/SEM1	7	96645724	96645932	SEM1	-	0.0615317	TRUE	0.6266415	9.0588370	0.0000000	0.0000000	0.4757020	16.33801
7:127591409-127591951/+/FSCN3	7	127591409	127591951	FSCN3	+	0.3332084	TRUE	0.8446530	13.6599377	0.0000000	0.0000000	0.8517817	28.91322
8:101198726-101199239/-/ZNF706	8	101198726	101199239	ZNF706	-	0.1575763	TRUE	0.5652140	2.8498588	0.0026610	1.0000000	0.5134903	15.38145
9:79571773-79572835/+/TLE4	9	79571773	79572835	TLE4	+	0.3314505	TRUE	0.4805687	-1.5151168	0.9335354	1.0000000	0.4487088	14.19573
9:79571941-79572835/+/TLE4	9	79571941	79572835	TLE4	+	0.3425540	TRUE	0.5324464	1.8857681	0.0311289	1.0000000	0.4993045	16.19725
9:79571965-79572835/+/TLE4	9	79571965	79572835	TLE4	+	0.3129983	TRUE	0.5223707	1.5287304	0.0647601	1.0000000	0.4732105	15.54075
9:124858458-124858537/-/RPL35	9	124858458	124858537	RPL35	-	0.3502975	TRUE	0.5369936	2.1858733	0.0155911	1.0000000	0.4623329	13.77950
9:128500067-128501292/+/ODF2	9	128500067	128501292	ODF2	+	0.0466525	TRUE	0.6216168	8.1232027	0.0000000	0.0000000	0.4261405	13.24331
9:128500067-128501260/+/ODF2	9	128500067	128501260	ODF2	+	0.0466525	TRUE	0.6216168	8.1232027	0.0000000	0.0000000	0.4261405	13.24331

Meta(obj.sel, assay = "junction") %>% filter(all.padj <1e-20) %>% knitr::kable()

	chr	start	end	gene_name	strand	moransi	autocorr.variable	gene_name.D	gene_name.t	gene_name.pval	gene_name.padj	all.D	all.t
1:244855044-244855424/-/HNRNPU	1	244855044	244855424	HNRNPU	-	0.1372512	TRUE	0.4599386	-3.751276	0.9998517	1.0000000	0.4784554	15.26365
15:80131293-80137480/+/ZFAND6	15	80131293	80137480	ZFAND6	+	0.1457399	TRUE	0.5318035	3.241519	0.0008102	0.7483565	0.4457714	15.09220
3:45007575-45011235/+/EXOSC7	3	45007575	45011235	EXOSC7	+	0.1611300	TRUE	0.5440651	3.856204	0.0001025	0.1325229	0.4670253	13.95661
3:52412920-52413996/+/PHF7	3	52412920	52413996	PHF7	+	0.1512534	TRUE	0.6429861	7.607239	0.0000000	0.0000000	0.5163198	16.13420
3:52414048-52414496/+/PHF7	3	52414048	52414496	PHF7	+	0.1885870	TRUE	0.6908519	8.943853	0.0000000	0.0000000	0.5930035	18.84522
7:4983773-4984910/+/RBAK-RBAKDN	7	4983773	4984910	RBAK-RBAKDN	+	0.1228286	TRUE	0.6280600	7.217475	0.0000000	0.0000001	0.4648469	14.47706
9:124858067-124858458/-/RPL35	9	124858067	124858458	RPL35	-	0.3729386	TRUE	0.5382251	1.941759	0.0275044	1.0000000	0.4623314	13.51239

Perform alternative splicing on a supervised two dimension space

We can also perform the analysis using a supervised two-dimensional cell space. As an example, we utilize Seurat’s built-in cell cycle scoring scheme for demonstration purposes.

DefaultAssay(obj.sel) <- "RNA"

s.genes <- cc.genes$s.genes
g2m.genes <- cc.genes$g2m.genes

obj.sel[['Group']] <- Idents(obj.sel)
obj.sel <- CellCycleScoring(obj.sel, s.features = s.genes, g2m.features = g2m.genes, set.ident = TRUE)

Warning: The following features are not present in the object: MLF1IP, E2F8,
not searching for symbol synonyms

Warning: The following features are not present in the object: FAM64A, AURKB,
HN1, not searching for symbol synonyms

df <- obj.sel[[]]

p1 <- ggplot(df) + geom_point(aes(x=S.Score, y = G2M.Score, color = Phase))
p2 <- ggplot(df) + geom_point(aes(x=S.Score, y = G2M.Score, color = Group))
p1 + p2

# Create cell cycle coordinate space with the module scores
ccc <- as.matrix(df[, c("S.Score", "G2M.Score")])
head(ccc)

                              S.Score   G2M.Score
d1-rep1-GACAGAGTCATACGGT-1 0.37908301  0.27462393
d1-rep1-GCTCCTATCATCGCTC-1 0.40969930  0.24794880
d1-rep1-GTAGTCATCCGAACGC-1 0.06002289  0.09029104
d1-rep1-TTTCCTCCACTATCTT-1 0.37392161  0.02509706
d1-rep1-TACGGTAGTAAGGGCT-1 0.51073399  0.28205031
d1-rep1-AGGGTGAAGATATGCA-1 0.24597679 -0.02650419

colnames(ccc) <- c("CCC_1", "CCC_2")

obj.sel[['ccc']] <- CreateDimReducObject(embeddings = ccc, assay = "RNA")

DefaultAssay(obj.sel) <- "exon"
obj.sel <- RunAutoCorr(obj.sel, reduction = "ccc", wm.name = "ccc_wm", dims=1:2)

Working on assay : exon

Run autocorrelation test for 72346 features.

Runtime : 15.69235 secs

60820 autocorrelated features.

obj.sel <- RunSDT(obj.sel, wm.name = "ccc_wm", bind.name = "gene_name", bind.assay = "RNA", prefix = "ccc")

Working on assay exon.

Working on binding assay RNA.

Use predefined weight matrix "ccc_wm".

Processing 60820 features.

Processing 12689 binding features.

Retrieve binding data from assay RNA.

Use "data" layer for test features and binding features.

Using 7 threads.

Runtime : 8.745614 mins.

DefaultAssay(obj.sel) <- "junction"
obj.sel <- RunAutoCorr(obj.sel, reduction = "ccc", wm.name = "ccc_wm", dims=1:2)

Working on assay : junction

Run autocorrelation test for 7583 features.

Runtime : 1.545865 secs

6884 autocorrelated features.

obj.sel <- RunSDT(obj.sel, wm.name = "ccc_wm", bind.name = "gene_name", bind.assay = "RNA", prefix = "ccc")

Working on assay junction.

Working on binding assay RNA.

Use predefined weight matrix "ccc_wm".

Processing 6884 features.

Processing 3549 binding features.

Retrieve binding data from assay RNA.

Use "data" layer for test features and binding features.

Using 7 threads.

Runtime : 53.14298 secs.

FbtPlot(obj.sel, val = "ccc.padj", assay = c("exon", "junction"), shape.by = "assay", color.by = "assay")

Meta(obj.sel, assay = "exon") %>% filter(ccc.padj<1e-30) %>% knitr::kable()

	chr	start	end	gene_name	strand	moransi	autocorr.variable	gene_name.D	gene_name.t	gene_name.pval	gene_name.padj	all.D	all.t	ccc.D	ccc.t
10:35179134-35179769/+/CREM	10	35179134	35179769	CREM	+	0.1660249	TRUE	0.6571513	6.536838	0.0000000	3.8e-06	0.6140893	20.52634	0.6498339	21.80302
10:35179134-35179847/+/CREM	10	35179134	35179847	CREM	+	0.1783511	TRUE	0.6671544	6.631572	0.0000000	2.6e-06	0.6322654	20.79021	0.6774472	23.44110
10:35179134-35179843/+/CREM	10	35179134	35179843	CREM	+	0.1765705	TRUE	0.6664554	6.630839	0.0000000	2.6e-06	0.6310836	20.75944	0.6755427	23.55242
3:45011235-45011471/+/EXOSC7	3	45011235	45011471	EXOSC7	+	0.1252058	TRUE	0.5317377	2.473719	0.0075368	1.0e+00	0.5021152	16.29195	0.5183262	19.60756
3:45011235-45011470/+/EXOSC7	3	45011235	45011470	EXOSC7	+	0.1252657	TRUE	0.5333368	2.571621	0.0058044	1.0e+00	0.5041374	16.46894	0.5193649	19.73390
7:74196642-74197050/+/EIF4H	7	74196642	74197050	EIF4H	+	0.0903061	TRUE	0.7475285	14.535688	0.0000000	0.0e+00	0.6529201	26.76167	0.6038284	19.55842
7:127591409-127591951/+/FSCN3	7	127591409	127591951	FSCN3	+	0.1644589	TRUE	0.8446530	13.659938	0.0000000	0.0e+00	0.8517817	28.91322	0.8107162	23.39077

FeaturePlot(obj.sel, reduction =  "ccc", features = c("7:74196642-74197050/+/EIF4H", "EIF4H"), order = TRUE)

Warning: Could not find 7:74196642-74197050/+/EIF4H in the default search
locations, found in 'exon' assay instead

Warning: Could not find EIF4H in the default search locations, found in 'RNA'
assay instead

gtf <- gtf2db("./Homo_sapiens.GRCh38.114.chr.gtf.gz")

[2025-06-19 18:42:26] GTF loading..
[2025-06-19 18:42:52] Load 78686 genes.

bamfiles <- list(rep1 = "./donor1_rep1.bam", rep2 = "./donor1_rep2.bam")
cell.list <- split(obj.sel$Phase, obj.sel$replicate)

cell.list1 <- lapply(cell.list, function(x) {
  nm <- names(x)
  names(x) <- gsub(".*rep[12]-(.*)","\\1",nm)
  x
})
str(cell.list1)

List of 2
 $ rep1: Named chr [1:954] "S" "S" "G2M" "S" ...
  ..- attr(*, "names")= chr [1:954] "GACAGAGTCATACGGT-1" "GCTCCTATCATCGCTC-1" "GTAGTCATCCGAACGC-1" "TTTCCTCCACTATCTT-1" ...
 $ rep2: Named chr [1:2023] "S" "S" "S" "S" ...
  ..- attr(*, "names")= chr [1:2023] "GGCTCGACATGTCTCC-1" "CCTATTAGTCCAGTAT-1" "GTAACGTCACTTACGA-1" "AAGGCAGCACGTCTCT-1" ...

TrackPlot(bamfile = bamfiles, cell.group = cell.list1, gtf = gtf, gene = "EIF4H", highlights = c(74196642,74197050), junc=TRUE)

Command(obj.sel)

 [1] "NormalizeData.RNA"            "FindVariableFeatures.RNA"    
 [3] "ScaleData.RNA"                "RunPCA.RNA"                  
 [5] "FindNeighbors.RNA.pca"        "FindClusters"                
 [7] "RunUMAP.RNA.pca"              "NormalizeData.exon"          
 [9] "ParseExonName.exon"           "NormalizeData.junction"      
[11] "ParseExonName.junction"       "RunAutoCorr.exon.pca"        
[13] "RunAutoCorr.junction.pca"     "RunAutoCorr.exon.ccc"        
[15] "SetAutoCorrFeatures.exon"     "RunSDT.exon"                 
[17] "RunAutoCorr.junction.ccc"     "SetAutoCorrFeatures.junction"
[19] "RunSDT.junction"

sessionInfo()

R version 4.5.0 (2025-04-11)
Platform: aarch64-apple-darwin24.4.0
Running under: macOS Sequoia 15.5

Matrix products: default
BLAS:   /opt/homebrew/Cellar/openblas/0.3.29/lib/libopenblasp-r0.3.29.dylib 
LAPACK: /opt/homebrew/Cellar/r/4.5.0/lib/R/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Asia/Shanghai
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] princurve_2.1.6    future_1.49.0      dplyr_1.1.4        Seurat_5.3.0      
[5] SeuratObject_5.1.0 sp_2.2-0           ggplot2_3.5.2      Yano_1.2          

loaded via a namespace (and not attached):
  [1] deldir_2.0-4           pbapply_1.7-2          gridExtra_2.3         
  [4] rlang_1.1.6            magrittr_2.0.3         RcppAnnoy_0.0.22      
  [7] spatstat.geom_3.3-6    matrixStats_1.5.0      ggridges_0.5.6        
 [10] compiler_4.5.0         png_0.1-8              vctrs_0.6.5           
 [13] reshape2_1.4.4         stringr_1.5.1          pkgconfig_2.0.3       
 [16] fastmap_1.2.0          labeling_0.4.3         promises_1.3.2        
 [19] rmarkdown_2.29         purrr_1.0.4            xfun_0.52             
 [22] jsonlite_2.0.0         goftest_1.2-3          later_1.4.2           
 [25] spatstat.utils_3.1-4   irlba_2.3.5.1          parallel_4.5.0        
 [28] cluster_2.1.8.1        R6_2.6.1               ica_1.0-3             
 [31] stringi_1.8.7          RColorBrewer_1.1-3     spatstat.data_3.1-6   
 [34] reticulate_1.42.0      spatstat.univar_3.1-3  parallelly_1.44.0     
 [37] lmtest_0.9-40          scattermore_1.2        Rcpp_1.0.14           
 [40] knitr_1.50             tensor_1.5             future.apply_1.11.3   
 [43] zoo_1.8-14             R.utils_2.13.0         sctransform_0.4.2     
 [46] httpuv_1.6.16          Matrix_1.7-3           splines_4.5.0         
 [49] igraph_2.1.4           tidyselect_1.2.1       viridis_0.6.5         
 [52] abind_1.4-8            yaml_2.3.10            spatstat.random_3.3-3 
 [55] codetools_0.2-20       miniUI_0.1.2           spatstat.explore_3.4-2
 [58] listenv_0.9.1          lattice_0.22-6         tibble_3.2.1          
 [61] plyr_1.8.9             withr_3.0.2            shiny_1.10.0          
 [64] ROCR_1.0-11            evaluate_1.0.3         Rtsne_0.17            
 [67] fastDummies_1.7.5      survival_3.8-3         polyclip_1.10-7       
 [70] fitdistrplus_1.2-2     pillar_1.10.2          KernSmooth_2.23-26    
 [73] plotly_4.10.4          generics_0.1.4         RcppHNSW_0.6.0        
 [76] scales_1.4.0           gtools_3.9.5           globals_0.18.0        
 [79] xtable_1.8-4           glue_1.8.0             lazyeval_0.2.2        
 [82] tools_4.5.0            data.table_1.17.2      RSpectra_0.16-2       
 [85] RANN_2.6.2             dotCall64_1.2          cowplot_1.1.3         
 [88] grid_4.5.0             tidyr_1.3.1            nlme_3.1-168          
 [91] patchwork_1.3.0        cli_3.6.5              spatstat.sparse_3.1-0 
 [94] spam_2.11-1            viridisLite_0.4.2      uwot_0.2.3            
 [97] gtable_0.3.6           R.methodsS3_1.8.2      digest_0.6.37         
[100] progressr_0.15.1       ggrepel_0.9.6          htmlwidgets_1.6.4     
[103] farver_2.1.2           htmltools_0.5.8.1      R.oo_1.27.1           
[106] lifecycle_1.0.4        httr_1.4.7             mime_0.13             
[109] MASS_7.3-65