Alternative splicing analysis for single cell RNA sequencing

This vignette uses gene, exon, and junction expression files generated from the Annotate Various Features for Alignment. While current state-of-the-art scRNA-seq methods tend to be biased towards the 3’ or 5’ ends of transcripts, it is still possible to obtain coverage information for a subset of exons. Despite the sparsity of gene and exon expression in single cells, our spatial dissimilarity test leverages the spatial distribution properties of features. This means that even features with low overall expression but strong spatial expression patterns across cells can still be highlighted. By performing a spatial dissimilarity test between exon/junction and gene expression, we can predict potential alternative splicing events.

0. Prerequisite

Make sure you have installed the R environment and Yano package before proceeding with the testing.

1. Perform cell clustering with Seurat

Load Yano will automatically load Seurat.

require(Yano)

Loading required package: Yano

── Attaching packages ────────────────────────────────────────────── Yano 1.2 ──
✔ dplyr   1.1.4     ✔ Seurat  5.3.0
✔ ggplot2 3.5.2

# Read raw gene expression matrix
exp <- ReadPISA("./exp/")

In this section, we will perform the standard Seurat analysis pipeline. Since the spatial dissimilarity test is not rely on cell clustering so changing the resolution or other parameters for FindClusters and RunUMAP will not impact the outcome of the spatial dissimilarity test.

# Create Seurat object and filter droplets with fewer than 1000 genes
obj <- CreateSeuratObject(exp, min.features = 1000, min.cells = 10)

# Filter low quality droplets
obj[["percent.mt"]] <- PercentageFeatureSet(obj, pattern = "^MT-")
obj <- subset(obj, nFeature_RNA < 9000 & percent.mt < 20)

# Downsampling to 2000 cells for fast testing
obj <- obj[, sample(colnames(obj),2000)]

# We run the cell clustering analysis with Seurat pipeline
obj <- NormalizeData(obj) %>% FindVariableFeatures() %>% ScaleData() %>%  RunPCA(verbose=FALSE) %>% FindNeighbors(dims = 1:10, verbose=FALSE) %>% FindClusters(resolution = 0.5, verbose=FALSE) %>% RunUMAP(dims=1:10, verbose=FALSE)

Normalizing layer: counts

Finding variable features for layer counts

Centering and scaling data matrix

DimPlot(obj, label=TRUE, label.size = 5, label.box = TRUE)

2. Perform alternative splicing with exon assay

In this section, we will compare exon expression patterns with the expression patterns of their corresponding genes in a spatial context. Here, the term “spatial” refers to the organization of cells in space. In this vignette, we will use the PCA space for the analysis, but the approach can also be applied to lineage trajectories, spatial coordinates or integration space such as harmony. The spatial dissimilarity test is divided into several steps. First, we will load exon data as a new assay in the Seurat object. In the second step, we will perform a spatial autocorrelation test for all exons and select the ones that show significant autocorrelation for further analysis. Next, we will define the binding relationship between exons and their corresponding genes and run the spatial dissimilarity test. After testing, P values and adjusted P values for each exon will be provided.

# Read exon count matrix file
exon <- ReadPISA("./exon/")

# Load the exon expression to Seurat object as a new assay, make sure the exon matrix has the same cells.
obj[['exon']] <- CreateAssayObject(exon[, colnames(obj)], min.cells=20)

# Switch work assay to exon
DefaultAssay(obj) <- "exon"
# Empty info for exon features
head(obj[['exon']][[]]) %>% knitr::kable()

chr1:135141-135895/-/ENSG00000268903
chr1:629640-630683/+/MTND2P28
chr1:631074-632616/+/MTCO1P12
chr1:632757-633438/+/MTCO2P12
chr1:633696-634376/+/MTATP6P1
chr1:634376-634922/+/MTCO3P12

obj <- ParseExonName(obj)

Working on assay exon

# Gene name and location are parsed from exon name
head(obj[['exon']][[]]) %>% knitr::kable()

	chr	start	end	gene_name	strand
chr1:135141-135895/-/ENSG00000268903	chr1	135141	135895	ENSG00000268903	-
chr1:629640-630683/+/MTND2P28	chr1	629640	630683	MTND2P28	+
chr1:631074-632616/+/MTCO1P12	chr1	631074	632616	MTCO1P12	+
chr1:632757-633438/+/MTCO2P12	chr1	632757	633438	MTCO2P12	+
chr1:633696-634376/+/MTATP6P1	chr1	633696	634376	MTATP6P1	+
chr1:634376-634922/+/MTCO3P12	chr1	634376	634922	MTCO3P12	+

# Normalize the data for spatial autocorrelation test
obj <- NormalizeData(obj)
obj <- RunAutoCorr(obj)

Working on assay : exon

Run autocorrelation test for 117043 features.

Runtime : 19.62422 secs

70909 autocorrelated features.

The permutation process can be computationally expensive. In the example below, I set perm=20 to perform only 20 permutations for quicker results. However, the default setting runs 100 permutations for more accurate evaluation. While it’s possible to increase the number of permutations for even more precision, it may not always be necessary. If you’re running Yano for the first time on your dataset, setting perm=20 can help you save time and provide an initial overview of the entire dataset.

obj <- RunSDT(obj, bind.name = "gene_name", bind.assay = "RNA", perm=20)

Working on assay exon.

Working on binding assay RNA.

Use predefined weight matrix "pca_wm".

Processing 70909 features.

Processing 14881 binding features.

Retrieve binding data from assay RNA.

Use "data" layer for test features and binding features.

Using 7 threads.

Runtime : 2.405475 mins.

# Now p values and adjusted p values have been generated
head(obj[['exon']][[]]) %>% knitr::kable()

	chr	start	end	gene_name	strand	moransi.pval	moransi	autocorr.variable	gene_name.D	gene_name.t	gene_name.pval	gene_name.padj
chr1:135141-135895/-/ENSG00000268903	chr1	135141	135895	ENSG00000268903	-	0.0000001	0.0222427	TRUE	0.0002562	-7.275754	0.9999997	1
chr1:629640-630683/+/MTND2P28	chr1	629640	630683	MTND2P28	+	0.0000013	0.0203117	TRUE	0.0017026	-8.030082	0.9999999	1
chr1:631074-632616/+/MTCO1P12	chr1	631074	632616	MTCO1P12	+	0.7821273	-0.0038815	FALSE	NA	NA	NA	NA
chr1:632757-633438/+/MTCO2P12	chr1	632757	633438	MTCO2P12	+	0.5700335	-0.0012501	FALSE	NA	NA	NA	NA
chr1:633696-634376/+/MTATP6P1	chr1	633696	634376	MTATP6P1	+	0.0000000	0.3171474	TRUE	0.0032809	-7.659511	0.9999998	1
chr1:634376-634922/+/MTCO3P12	chr1	634376	634922	MTCO3P12	+	0.8377380	-0.0046027	FALSE	NA	NA	NA	NA

# Plot feature binding test plot
FbtPlot(obj, val = "gene_name.padj")

The chromosome names are too long and tend to overlap in the visualization. To resolve this, you can either resize the labels or remove the ‘chr’ prefix from the chromosome names. Additionally, since the Y chromosome and mitochondrial are not of particular interest to us in this analysis, they can be excluded from the visualization.

sel.chrs <- c(1:21, "X")
FbtPlot(obj, val = "gene_name.padj", remove.chr = TRUE, sel.chrs = sel.chrs)

# Let's see how many exons are expressed in different spatial pattern with their genes
obj[['exon']][[]] %>% filter(gene_name.padj < 0.001) %>% knitr::kable()

	chr	start	end	gene_name	strand	moransi.pval	moransi	autocorr.variable	gene_name.D	gene_name.t	gene_name.pval	gene_name.padj
chr11:35197162-35197793/+/CD44	chr11	35197162	35197793	CD44	+	0.0000000	0.3402391	TRUE	0.3346163	9.289870	0e+00	0.0000317
chr11:75421727-75422280/+/RPS3	chr11	75421727	75422280	RPS3	+	0.0000000	0.0493364	TRUE	0.2921820	7.352049	3e-07	0.0006783
chr11:123060825-123061329/-/HSPA8	chr11	123060825	123061329	HSPA8	-	0.0000000	0.1313983	TRUE	0.2745198	9.259444	0e+00	0.0000317
chr12:6943817-6944173/+/C12orf57	chr12	6943817	6944173	C12orf57	+	0.0000000	0.0652261	TRUE	0.3459513	8.895493	0e+00	0.0000540
chr12:53303034-53306861/+/ENSG00000288663	chr12	53303034	53306861	ENSG00000288663	+	0.0000000	0.0259674	TRUE	0.3661311	7.157851	4e-07	0.0009298
chr12:53306680-53306861/+/ENSG00000288663	chr12	53306680	53306861	ENSG00000288663	+	0.0000000	0.0244351	TRUE	0.3681630	7.186346	4e-07	0.0009075
chr12:56161387-56161465/+/MYL6	chr12	56161387	56161465	MYL6	+	0.0000000	0.2516071	TRUE	0.7751959	23.317591	0e+00	0.0000000
chr12:79872808-79872938/-/PPP1R12A	chr12	79872808	79872938	PPP1R12A	-	0.0000000	0.0556598	TRUE	0.3002225	8.083645	1e-07	0.0001958
chr15:43801518-43801569/+/SERF2	chr15	43801518	43801569	SERF2	+	0.0000000	0.0974296	TRUE	0.3683830	7.801022	1e-07	0.0002985
chr15:43801711-43804427/+/SERF2	chr15	43801711	43804427	SERF2	+	0.0000000	0.1182703	TRUE	0.4634172	9.627454	0e+00	0.0000214
chr18:49481681-49482410/-/RPL17-C18orf32	chr18	49481681	49482410	RPL17-C18orf32	-	0.0000000	0.2771721	TRUE	0.9619640	40.306543	0e+00	0.0000000
chr19:16093684-16093753/+/TPM4	chr19	16093684	16093753	TPM4	+	0.0000000	0.1013825	TRUE	0.3247428	10.275068	0e+00	0.0000080
chr19:16095264-16095357/+/TPM4	chr19	16095264	16095357	TPM4	+	0.0000000	0.1949699	TRUE	0.5515461	19.018562	0e+00	0.0000000
chr19:16095264-16096744/+/TPM4	chr19	16095264	16096744	TPM4	+	0.0000000	0.2477580	TRUE	0.5977548	23.910418	0e+00	0.0000000
chr19:16095264-16095454/+/TPM4	chr19	16095264	16095454	TPM4	+	0.0000000	0.2432229	TRUE	0.6049650	28.141821	0e+00	0.0000000
chr19:16095264-16095893/+/TPM4	chr19	16095264	16095893	TPM4	+	0.0000000	0.2506281	TRUE	0.5989865	24.155913	0e+00	0.0000000
chr19:16095264-16095591/+/TPM4	chr19	16095264	16095591	TPM4	+	0.0000000	0.2518667	TRUE	0.6011723	24.235526	0e+00	0.0000000
chr19:16095264-16095496/+/TPM4	chr19	16095264	16095496	TPM4	+	0.0000000	0.2582948	TRUE	0.6168771	28.105457	0e+00	0.0000000
chr19:18169087-18170494/+/ENSG00000268173	chr19	18169087	18170494	ENSG00000268173	+	0.0000000	0.0506799	TRUE	0.4966975	20.652615	0e+00	0.0000000
chr2:218344808-218346793/+/PNKD	chr2	218344808	218346793	PNKD	+	0.0000000	0.0899427	TRUE	0.4868878	11.778748	0e+00	0.0000009
chr2:218344808-218346784/+/PNKD	chr2	218344808	218346784	PNKD	+	0.0000000	0.0889706	TRUE	0.4851371	12.634455	0e+00	0.0000003
chr2:218344808-218346756/+/PNKD	chr2	218344808	218346756	PNKD	+	0.0000000	0.0887807	TRUE	0.4852883	12.825710	0e+00	0.0000003
chr2:218344808-218346791/+/PNKD	chr2	218344808	218346791	PNKD	+	0.0000000	0.0899427	TRUE	0.4868878	11.778748	0e+00	0.0000009
chr2:218344808-218346771/+/PNKD	chr2	218344808	218346771	PNKD	+	0.0000000	0.0879248	TRUE	0.4843662	12.855317	0e+00	0.0000003
chr5:179811988-179813641/+/SQSTM1	chr5	179811988	179813641	SQSTM1	+	0.0003873	0.0141935	TRUE	0.3137492	9.096496	0e+00	0.0000399
chr5:179812407-179813598/+/SQSTM1	chr5	179812407	179813598	SQSTM1	+	0.0003694	0.0142486	TRUE	0.3140973	8.869969	0e+00	0.0000540
chrX:119582588-119582676/+/UBE2A	chrX	119582588	119582676	UBE2A	+	0.0000000	0.0315370	TRUE	0.3095027	7.926565	1e-07	0.0002523
chrX:119583127-119583347/+/UBE2A	chrX	119583127	119583347	UBE2A	+	0.0000000	0.0391280	TRUE	0.3247510	9.360876	0e+00	0.0000297
chrX:119583127-119584101/+/UBE2A	chrX	119583127	119584101	UBE2A	+	0.0000000	0.0363548	TRUE	0.3127182	8.108500	1e-07	0.0001945
chrX:119583127-119583417/+/UBE2A	chrX	119583127	119583417	UBE2A	+	0.0000000	0.0428587	TRUE	0.3280701	9.444139	0e+00	0.0000274
chrX:119583127-119583621/+/UBE2A	chrX	119583127	119583621	UBE2A	+	0.0000000	0.0413061	TRUE	0.3235995	8.773698	0e+00	0.0000612
chrX:151404916-151405157/+/VMA21	chrX	151404916	151405157	VMA21	+	0.0000000	0.0550083	TRUE	0.3621630	7.798756	1e-07	0.0002985

# Random select a gene and its exons and visulize with FeaturePlot.
FeaturePlot(obj, features = c("chr15:43801711-43804427/+/SERF2", "SERF2"), ncol=2)

# The default color and parameters perhaps not easily to tell the difference between exon and its binding gene expression. Let's change the scaled colors and enlarge point size and order by expression.
require(RColorBrewer)

Loading required package: RColorBrewer

FeaturePlot(obj, features = c("chr15:43801711-43804427/+/SERF2", "SERF2"), ncol=2, order = TRUE, pt.size=1) & scale_colour_gradientn(colours = rev(brewer.pal(n = 11, name = "RdBu")))

Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.

Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.

We can also map the ratio of exon expression to gene expression on the UMAP. The RatioPlot function is designed for this purpose. As observed, the gene SERF2 is relatively low expressed in some cell groups, while the ratio of the exon chr15:43801711-43804427/+/SERF2 is higher in these groups.

RatioPlot(obj, features = c("chr15:43801711-43804427/+/SERF2"), assay = 'exon', bind.assay = 'RNA', bind.name = "gene_name", order = TRUE, pt.size=1)

In the feature plot and ratio plot above, the exon appears to lack a strong expression pattern across cell groups, whereas the gene SERF2 seems to be highly expressed in many groups, but with few exception. This inconsistent expression pattern between the exon and its corresponding gene may suggest differential exon usage. To explore the coverage details of both the exon and the gene body, we will generate a track plot next.

In our package, retrieving gene locations requires loading a GTF file instead of relying on current Bioconductor databases, such as org.Hs.eg.db. This is due to the varying versions of gene annotations provided by different institutes, which can introduce inconsistencies. To avoid potential bias during preprocessing and postprocessing, we strongly recommend using the same GTF file consistently throughout your project. The Yano package includes the gtf2db function, which enables you to load a GTF file into memory for further analysis.

gtf <- gtf2db("./gencode.v44.annotation.gtf.gz")

[2025-06-26 00:23:49] GTF loading..
[2025-06-26 00:24:11] Load 62700 genes.

A track plot is used to study the read coverage per cell group. In the track plot shown below, the cell group is specified by the cell.group parameter. Unlike IGV, where read depth is used, we use UMI depth in this plot. The cell barcode tag and UMI tag are predefined as “CB” and “UB” with parameter cell.tag and umi.tag. For each cell group, the UMI depth has been normalized by the number of cells in that group. This means that the depth at each location can be interpreted as the mean UMI depth per cell for the group. As a result, the tracks are directly comparable across different cell groups. If cell.group is not set, the track plot will generate the raw UMI depth per location.

TrackPlot(bamfile = "Parent_SC3v3_Human_Glioblastoma_possorted_genome_bam.bam", gtf = gtf, gene = "SERF2", cell.group = Idents(obj), highlights = c(43801711,43804427) )

In the track plot, we can easily observe that an exon around position 43,794,000 dominates the expression of the SERF2 gene and is highly expressed in many cell groups. However, the exon ‘chr15:43801711-43804427/+/SERF2’ (highlighted) shows low expression and is not visible in the track plot. To visualize low-expressed exons, we can set the max.depth parameter to 2, which caps the UMI depth at 2. And many genes in the region, we set display.genes to SERF2 only. This adjustment allows the low-expressed exons and their related transcripts to be more clearly represented in the plot. In this case, we can found the the highlighed exon shows different expressed pattern with the gene SERF2.

TrackPlot(bamfile = "Parent_SC3v3_Human_Glioblastoma_possorted_genome_bam.bam", gtf = gtf, gene = "SERF2", cell.group = Idents(obj), highlights = c(43801711,43804427), max.depth = 2, display.genes = "SERF2")

3. Load junction assay

In addition to exon expression, junction expression can provide insights into different expression patterns across transcripts, offering a complementary perspective. Junction expression refers to the UMI counts of reads that span more than one exon. It’s important to note that junctions are named similarly to exons, but the start and end positions are different. The start of the junction corresponds to the end of the previous exon, while the end of the junction represents the start of the next exon.

junction <- ReadPISA("./junction/")
obj[['junction']] <- CreateAssayObject(junction[, colnames(obj)], min.cells=20)

DefaultAssay(obj) <- "junction"
obj <- NormalizeData(obj)

# select spatial autocorrelated junctions
obj <- RunAutoCorr(obj)

Working on assay : junction

Run autocorrelation test for 14786 features.

Runtime : 2.486992 secs

7531 autocorrelated features.

# Parse the gene name and coordinates from junction names
obj <- ParseExonName(obj)

Working on assay junction

# perform dissimilarity test between junctions and their binding genes
obj <- RunSDT(obj, bind.name = "gene_name", bind.assay = "RNA", perm=20)

Working on assay junction.

Working on binding assay RNA.

Use predefined weight matrix "pca_wm".

Processing 7531 features.

Processing 3775 binding features.

Retrieve binding data from assay RNA.

Use "data" layer for test features and binding features.

Using 7 threads.

Runtime : 14.9623 secs.

FbtPlot(obj, val="gene_name.padj", remove.chr=TRUE, sel.chrs = sel.chrs)

obj[['junction']][[]] %>% filter(gene_name.padj<1e-5)

                               moransi.pval   moransi autocorr.variable   chr
chr12:56160320-56161387/+/MYL6            0 0.1921649              TRUE chr12
chr19:16093753-16095264/+/TPM4            0 0.1909011              TRUE chr19
                                  start      end gene_name strand gene_name.D
chr12:56160320-56161387/+/MYL6 56160320 56161387      MYL6      +   0.7466745
chr19:16093753-16095264/+/TPM4 16093753 16095264      TPM4      +   0.5364398
                               gene_name.t gene_name.pval gene_name.padj
chr12:56160320-56161387/+/MYL6    13.04541   3.121098e-11   1.175250e-07
chr19:16093753-16095264/+/TPM4    18.46030   6.795704e-14   5.117845e-10

# Because both exon and junction are compared with gene, so it's reasonable to combine these two assays in one plot
FbtPlot(obj, val="gene_name.padj", assay = c("exon", "junction"), col.by = "assay", shape.by = "assay", pt.size = 2, remove.chr = TRUE, sel.chrs = sel.chrs, cols = c("red", "blue"))

# We can find there is an exon and a junction at chromosome 12 with very low p value (<1e-8), let's see which gene they are located
obj[['exon']][[]] %>% filter(chr == "chr12" & gene_name.pval < 1e-8) %>% knitr::kable()

	chr	start	end	gene_name	strand	moransi.pval	moransi	autocorr.variable	gene_name.D	gene_name.t	gene_name.pval	gene_name.padj
chr12:56161387-56161465/+/MYL6	chr12	56161387	56161465	MYL6	+	0	0.2516071	TRUE	0.7751959	23.31759	0	0

obj[['junction']][[]] %>% filter(chr == "chr12" & gene_name.pval < 1e-8) %>% knitr::kable()

	moransi.pval	moransi	autocorr.variable	chr	start	end	gene_name	strand	gene_name.D	gene_name.t	gene_name.pval	gene_name.padj
chr12:56160320-56161387/+/MYL6	0	0.1921649	TRUE	chr12	56160320	56161387	MYL6	+	0.7466745	13.04541	0	1e-07

FeaturePlot(obj, features = c("chr12:56161387-56161465/+/MYL6","chr12:56160320-56161387/+/MYL6", "MYL6"), order = TRUE, pt.size = 2, ncol=3) & scale_colour_gradientn(colours = rev(brewer.pal(n = 11, name = "RdBu")))

Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.

Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.

# We could also plot the expression ratio of these exons or junctions on umap
p1 <- RatioPlot(obj, assay = "exon", bind.assay = "RNA", bind.name = "gene_name", features = "chr12:56161387-56161465/+/MYL6")
p2 <- RatioPlot(obj, assay = "exon", bind.assay = "RNA", bind.name = "gene_name", features = "chr12:56160626-56160670/+/MYL6")
p3 <- RatioPlot(obj, assay = "junction", bind.assay = "RNA", bind.name = "gene_name", features = "chr12:56160320-56161387/+/MYL6")
cowplot::plot_grid(p1,p2,p3, ncol=3)

We then visualize the track plot for this gene, including junction reads by setting junc=TRUE. The height of the splice paths in the plot represents the expression level of each junction within the specified cell group.

TrackPlot(bamfile = "Parent_SC3v3_Human_Glioblastoma_possorted_genome_bam.bam", gtf = gtf, gene = "MYL6", cell.group = Idents(obj), junc = TRUE, highlights = list(c(56160320,56161387),c(56161387,56161465)))

You might be wondering why the exon chr12:56161387-56161465/+/MYL6 appears highly expressed in cell group 3 in the track plot, where the overlapping peak is clearly higher than in other groups, but its expression level in the feature plot is not as high as expected.

This discrepancy arises because the exon is overlapping with other exons from different transcripts. We only count reads that are fully contained within the exon as part of the exon’s expression. Therefore, reads that partially overlap with this exon are not included in the count.

In contrast, the overlapping exon chr12:56161387-56161575/+/MYL6 shows higher expression in group 3 compared to other groups. It’s important to note that if a read is fully contained within two or more overlapping exons, PISA will count it for all relevant exons. Check PISA’s manual for details.

p1 <- DimPlot(obj, label=TRUE, label.size = 5, label.box = TRUE)
p2 <- FeaturePlot(obj, features = c("chr12:56161387-56161575/+/MYL6"), order = TRUE, pt.size = 1) & scale_colour_gradientn(colours = rev(brewer.pal(n = 11, name = "RdBu")))

Warning: Could not find chr12:56161387-56161575/+/MYL6 in the default search
locations, found in 'exon' assay instead

Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.

p1 + p2

4. Heatmap analysis for highlight group specific alternative splicing

The spatial dissimilarity test method prioritizes alternatively spliced exons and junctions across all cells but does not identify which specific cell groups exhibit these splicing events. To address this, let’s manually extract the scaled expression data for the selected alternatively spliced exons and their corresponding genes, then perform a co-clustering analysis. A comprehensive heatmap will be generated using the ComplexHeatmap package, providing a visual representation of the exon and gene distribution across cell groups.

obj[['exon']][[]] %>% filter(gene_name.padj<0.001) %>% rownames -> exons
obj[['exon']][[]] %>% filter(gene_name.padj<0.001) %>% pull(gene_name) -> bind.genes
DefaultAssay(obj) <- "RNA"
obj <- ScaleData(obj, features = unique(bind.genes))
DefaultAssay(obj) <- "exon"
obj <- ScaleData(obj, features = exons)

dat1 <- GetAssayData(obj, assay = 'exon', layer = 'scale.data')
dat2 <- GetAssayData(obj, assay = 'RNA', layer = 'scale.data')
idents <- sort(Idents(obj))
order.cells <- names(idents)

dat2 <- dat2[bind.genes,]
rownames(dat2) <- exons

dat <- cbind(dat1, dat2)

require(ComplexHeatmap)
d <- dist(dat)
hc <- hclust(d)
idx <- hc$labels[hc$order]

ha <- HeatmapAnnotation(group=idents, border = TRUE)
ht1 <- Heatmap(dat1[idx, order.cells], cluster_rows = FALSE, cluster_columns = FALSE, show_column_names = FALSE, border = TRUE,  top_annotation = ha, name = "exon", column_title = "exon")
ht2 <- Heatmap(dat2[idx, order.cells], cluster_rows = FALSE, cluster_columns = FALSE, show_column_names = FALSE, border = TRUE,  top_annotation = ha, name = "gene", column_title = "gene", row_names_max_width = max_text_width(rownames(dat2), gp = gpar(fontsize = 12)))

ht <- ht1 + ht2
draw(ht, heatmap_legend_side = "left",  annotation_legend_side = "left")

5. Test between exon and exon skipped reads

In the previous sections, we conducted a spatial dissimilarity test between exon/junction expression and gene expression. However, binding features are not always limited to genes; they can also correspond to other types of features. In this section, we perform a test between exon expression and reads that skip this exon (exclude assay). This approach is similar to the Percent Spliced In (PSI) method, which is widely used to analyze alternative splicing in both bulk and single-cell RNA-seq data. The PSI is calculated as: PSI = exon reads / (exon reads + reads skipping this exon).

# The reads that skip exons are annotated using the `-psi` option in PISA anno, and these counts are stored in the `exclude` directory. We then load these excluded counts into a new assay.
exclude <- ReadPISA("exclude/")
obj[['exclude']] <- CreateAssayObject(exclude[,colnames(obj)], min.cells = 10)
DefaultAssay(obj) <- "exclude"
# Normalize counts for exon-excluded reads
obj <- NormalizeData(obj)
# Then we switch to exon assay
DefaultAssay(obj) <- "exon"

# Because the feature names in the exclude assay are exactly the same as those in the exon assay, they represent the reads that skip each corresponding exon. Therefore, we set up the binding feature using the exon name itself.
obj[['exon']][['exon_name']] <- rownames(obj)
obj[['exon']][['exon_name']] %>% head

                                                                exon_name
chr1:135141-135895/-/ENSG00000268903 chr1:135141-135895/-/ENSG00000268903
chr1:629640-630683/+/MTND2P28               chr1:629640-630683/+/MTND2P28
chr1:631074-632616/+/MTCO1P12               chr1:631074-632616/+/MTCO1P12
chr1:632757-633438/+/MTCO2P12               chr1:632757-633438/+/MTCO2P12
chr1:633696-634376/+/MTATP6P1               chr1:633696-634376/+/MTATP6P1
chr1:634376-634922/+/MTCO3P12               chr1:634376-634922/+/MTCO3P12

# Then we perform spatial dissimilarity test between exon and exclude, mode 1
obj <- RunSDT(obj, bind.name = "exon_name", bind.assay = "exclude")

# Swith to exon exluded assay
DefaultAssay(obj) <- "exclude"
obj <- RunAutoCorr(obj)
obj <- ParseExonName(obj)
obj[['exclude']][['exon_name']] <- rownames(obj)

obj <- RunSDT(obj, bind.name = "exon_name", bind.assay = "exon")

FbtPlot(obj, val = "exon_name.padj", remove.chr = TRUE, assay = c("exclude", "exon"), shape.by = "assay", col.by = "assay", cols = c("yellow", "green"), pt.size = 2)

# Let's how many exons can be prioritized by both exon assay and exclude assay
obj[['exclude']][[]] %>% filter(exon_name.padj<1e-5) %>% rownames -> sel1
obj[['exon']][[]] %>% filter(exon_name.padj<1e-5) %>% rownames -> sel2
intersect(sel1,sel2)

 [1] "chr1:19342752-19342864/-/CAPZB"    "chr1:153990914-153991034/+/RPS27" 
 [3] "chr10:7806974-7807010/+/ATP5F1C"   "chr10:128047570-128047683/+/PTPRE"
 [5] "chr12:56160626-56160670/+/MYL6"    "chr12:56160626-56160945/+/MYL6"   
 [7] "chr19:16095264-16095357/+/TPM4"    "chr19:16095264-16095454/+/TPM4"   
 [9] "chr19:16095264-16095496/+/TPM4"    "chr19:16095264-16095893/+/TPM4"   
[11] "chr19:16095264-16096744/+/TPM4"    "chr19:16095264-16095591/+/TPM4"   
[13] "chr2:197490527-197490664/-/HSPD1"  "chr20:37238402-37238449/+/RPN2"   
[15] "chr20:49280903-49280964/+/ZFAS1"   "chr3:197953488-197953660/+/RPL35A"
[17] "chr5:83519349-83522309/+/VCAN"     "chr6:85677795-85677875/-/SNHG5"   
[19] "chr6:85677791-85677875/-/SNHG5"    "chr9:35684732-35684807/-/TPM2"    
[21] "chr9:35684732-35684802/-/TPM2"     "chrX:154400464-154400626/+/RPL10"

DefaultAssay(obj) <- "exclude"
p1 <- FeaturePlot(obj, features = c("chr19:16095264-16095357/+/TPM4"),order = TRUE)
DefaultAssay(obj) <- "exon"
p2 <- FeaturePlot(obj, features = c("chr19:16095264-16095357/+/TPM4"),order = TRUE)
p3 <- PSIPlot(obj, exon.assay = "exon", exclude.assay = "exclude", features = c("chr19:16095264-16095357/+/TPM4"),order = TRUE)
cowplot::plot_grid(p1,p2,p3, ncol=3)

TrackPlot(bamfile = "Parent_SC3v3_Human_Glioblastoma_possorted_genome_bam.bam", gtf = gtf, gene = "TPM4", cell.group = Idents(obj), highlights = c(16095264,16095357), junc = TRUE, max.depth = 1)

In previous sections, we noted that exon or junction expression is part of gene expression, and inverse expression patterns can strongly indicate alternative splicing. However, exon-skipped reads are largely independent of exon expression, making them more sensitive for detecting alternative splicing and allowing for the prioritization of many events.

Because our spatial dissimilarity test does not account for the spatial dependency of the binding feature, numerous events may be prioritized, especially when the binding feature is sparsely expressed. While some of these events may be true, others might arise due to low coverage. To enhance detection power and reduce potential false positives, intersecting the prioritized exons with the prioritized exon-excluded features can help refine the results.

Mode 3 can be used as an alternative to mode 1 to specifically detect events with strong inverse expression patterns. In mode 3, the exon reads and exon-excluded reads are summed as the binding assay, allowing for a more targeted analysis of such patterns. It is important to note that events detected with mode 3 are always detectable with mode 1, making mode 3 a refined approach for prioritizing inverse expression events.

DefaultAssay(obj) <- "exclude"
obj <- RunSDT(obj, bind.name = "exon_name", bind.assay = "exon", mode = 3, prefix = "mode3")
DefaultAssay(obj) <- "exon"
obj <- RunSDT(obj, bind.name = "exon_name", bind.assay = "exclude", mode = 3, prefix = "mode3")
FbtPlot(obj, val = "mode3.padj", remove.chr = TRUE, assay = c("exclude", "exon"), shape.by = "assay", col.by = "assay", pt.size = 2, cols=c("yellow", "green"))

In this case study, we performed a spatial dissimilarity test between various feature pairs. This method provides an overview of the entire cell population and does not rely on prior cell clustering and annotation, making it a powerful tool for analyzing cell data without any prior knowledge. It is recommended to test different features including junctions and exons, and set up different with their corresponding genes in 3’ or 5’ biased scRNA-seq. Additionally, testing between exon-included and exon-skipped reads have more power to detect exon excluded events.

To obtain cell-cluster-specific expression patterns, applying heatmaps and clustering in subsequent analyses is recommended.

Questions?

If you have any questions regarding this vignette and the usage of Yano, please feel free to report them through the discussion forum. When submitting your query, please ensure you attach the commands you used for better clarity and support.

Command(obj)

 [1] "NormalizeData.RNA"            "FindVariableFeatures.RNA"    
 [3] "RunPCA.RNA"                   "FindNeighbors.RNA.pca"       
 [5] "FindClusters"                 "RunUMAP.RNA.pca"             
 [7] "ParseExonName.exon"           "NormalizeData.exon"          
 [9] "RunAutoCorr.exon.pca"         "SetAutoCorrFeatures.exon"    
[11] "NormalizeData.junction"       "RunAutoCorr.junction.pca"    
[13] "SetAutoCorrFeatures.junction" "ParseExonName.junction"      
[15] "RunSDT.junction"              "ScaleData.RNA"               
[17] "ScaleData.exon"               "NormalizeData.exclude"       
[19] "RunAutoCorr.exclude.pca"      "SetAutoCorrFeatures.exclude" 
[21] "ParseExonName.exclude"        "RunSDT.exclude"              
[23] "RunSDT.exon"

sessionInfo()

R version 4.5.0 (2025-04-11)
Platform: aarch64-apple-darwin24.4.0
Running under: macOS Sequoia 15.5

Matrix products: default
BLAS:   /opt/homebrew/Cellar/openblas/0.3.29/lib/libopenblasp-r0.3.29.dylib 
LAPACK: /opt/homebrew/Cellar/r/4.5.0/lib/R/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Asia/Shanghai
tzcode source: internal

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] ComplexHeatmap_2.24.0 RColorBrewer_1.1-3    future_1.49.0        
[4] dplyr_1.1.4           Seurat_5.3.0          SeuratObject_5.1.0   
[7] sp_2.2-0              ggplot2_3.5.2         Yano_1.2             

loaded via a namespace (and not attached):
  [1] shape_1.4.6.1          jsonlite_2.0.0         magrittr_2.0.3        
  [4] spatstat.utils_3.1-4   farver_2.1.2           rmarkdown_2.29        
  [7] GlobalOptions_0.1.2    vctrs_0.6.5            ROCR_1.0-11           
 [10] Cairo_1.6-2            spatstat.explore_3.4-2 htmltools_0.5.8.1     
 [13] sctransform_0.4.2      parallelly_1.44.0      KernSmooth_2.23-26    
 [16] htmlwidgets_1.6.4      ica_1.0-3              plyr_1.8.9            
 [19] plotly_4.10.4          zoo_1.8-14             igraph_2.1.4          
 [22] mime_0.13              lifecycle_1.0.4        iterators_1.0.14      
 [25] pkgconfig_2.0.3        Matrix_1.7-3           R6_2.6.1              
 [28] fastmap_1.2.0          clue_0.3-66            fitdistrplus_1.2-2    
 [31] shiny_1.10.0           digest_0.6.37          colorspace_2.1-1      
 [34] S4Vectors_0.46.0       patchwork_1.3.0        tensor_1.5            
 [37] RSpectra_0.16-2        irlba_2.3.5.1          labeling_0.4.3        
 [40] progressr_0.15.1       spatstat.sparse_3.1-0  httr_1.4.7            
 [43] polyclip_1.10-7        abind_1.4-8            compiler_4.5.0        
 [46] withr_3.0.2            doParallel_1.0.17      viridis_0.6.5         
 [49] fastDummies_1.7.5      R.utils_2.13.0         MASS_7.3-65           
 [52] rjson_0.2.23           gtools_3.9.5           tools_4.5.0           
 [55] lmtest_0.9-40          httpuv_1.6.16          future.apply_1.11.3   
 [58] goftest_1.2-3          R.oo_1.27.1            glue_1.8.0            
 [61] nlme_3.1-168           promises_1.3.2         Rtsne_0.17            
 [64] cluster_2.1.8.1        reshape2_1.4.4         generics_0.1.4        
 [67] gtable_0.3.6           spatstat.data_3.1-6    R.methodsS3_1.8.2     
 [70] tidyr_1.3.1            data.table_1.17.2      BiocGenerics_0.54.0   
 [73] spatstat.geom_3.3-6    RcppAnnoy_0.0.22       ggrepel_0.9.6         
 [76] RANN_2.6.2             foreach_1.5.2          pillar_1.10.2         
 [79] stringr_1.5.1          spam_2.11-1            RcppHNSW_0.6.0        
 [82] later_1.4.2            circlize_0.4.16        splines_4.5.0         
 [85] lattice_0.22-6         survival_3.8-3         deldir_2.0-4          
 [88] tidyselect_1.2.1       miniUI_0.1.2           pbapply_1.7-2         
 [91] knitr_1.50             gridExtra_2.3          IRanges_2.42.0        
 [94] scattermore_1.2        stats4_4.5.0           xfun_0.52             
 [97] matrixStats_1.5.0      stringi_1.8.7          lazyeval_0.2.2        
[100] yaml_2.3.10            evaluate_1.0.3         codetools_0.2-20      
[103] tibble_3.2.1           cli_3.6.5              uwot_0.2.3            
[106] xtable_1.8-4           reticulate_1.42.0      Rcpp_1.0.14           
[109] globals_0.18.0         spatstat.random_3.3-3  png_0.1-8             
[112] spatstat.univar_3.1-3  parallel_4.5.0         dotCall64_1.2         
[115] listenv_0.9.1          viridisLite_0.4.2      scales_1.4.0          
[118] ggridges_0.5.6         purrr_1.0.4            crayon_1.5.3          
[121] GetoptLong_1.0.5       rlang_1.1.6            cowplot_1.1.3