require(Yano)
Loading required package: Yano
── Attaching packages ────────────────────────────────────────────── Yano 1.0 ──
✔ dplyr 1.1.4 ✔ Seurat 5.2.1
✔ ggplot2 3.5.1
This vignette uses gene, exon, and junction expression files generated from the Annotate Various Features for Alignment. While current state-of-the-art scRNA-seq methods tend to be biased towards the 3’ or 5’ ends of transcripts, it is still possible to obtain coverage information for a subset of exons. Despite the sparsity of gene and exon expression in single cells, our spatial dissimilarity test leverages the spatial distribution properties of features. This means that even features with low overall expression but strong spatial expression patterns across cells can still be highlighted. By performing a spatial dissimilarity test between exon/junction and gene expression, we can predict potential alternative splicing events.
Yano
package before proceeding with the testing.Load Yano will automatically load Seurat.
Loading required package: Yano
── Attaching packages ────────────────────────────────────────────── Yano 1.0 ──
✔ dplyr 1.1.4 ✔ Seurat 5.2.1
✔ ggplot2 3.5.1
In this section, we will perform the standard Seurat analysis pipeline. Since the spatial dissimilarity test is not rely on cell clustering so changing the resolution or other parameters for FindClusters
and RunUMAP
will not impact the outcome of the spatial dissimilarity test.
# Create Seurat object and filter droplets with fewer than 1000 genes
obj <- CreateSeuratObject(exp, min.features = 1000, min.cells = 10)
# Filter low quality droplets
obj[["percent.mt"]] <- PercentageFeatureSet(obj, pattern = "^MT-")
obj <- subset(obj, nFeature_RNA < 9000 & percent.mt < 20)
# Downsampling to 2000 cells for fast testing
obj <- obj[, sample(colnames(obj),2000)]
# We run the cell clustering analysis with Seurat pipeline
obj <- NormalizeData(obj) %>% FindVariableFeatures() %>% ScaleData() %>% RunPCA(verbose=FALSE) %>% FindNeighbors(dims = 1:10, verbose=FALSE) %>% FindClusters(resolution = 0.5, verbose=FALSE) %>% RunUMAP(dims=1:10, verbose=FALSE)
Normalizing layer: counts
Finding variable features for layer counts
Centering and scaling data matrix
In this section, we will compare exon expression patterns with the expression patterns of their corresponding genes in a spatial context. Here, the term “spatial” refers to the organization of cells in space. In this vignette, we will use the PCA space for the analysis, but the approach can also be applied to lineage trajectories, spatial coordinates or integration space such as harmony. The spatial dissimilarity test is divided into several steps. First, we will load exon data as a new assay in the Seurat object. In the second step, we will perform a spatial autocorrelation test for all exons and select the ones that show significant autocorrelation for further analysis. Next, we will define the binding relationship between exons and their corresponding genes and run the spatial dissimilarity test. After testing, P values and adjusted P values for each exon will be provided.
# Read exon count matrix file
exon <- ReadPISA("./exon/")
# Load the exon expression to Seurat object as a new assay, make sure the exon matrix has the same cells.
obj[['exon']] <- CreateAssayObject(exon[, colnames(obj)], min.cells=20)
# Switch work assay to exon
DefaultAssay(obj) <- "exon"
# Empty info for exon features
head(obj[['exon']][[]]) %>% knitr::kable()
chr1:135141-135895/-/ENSG00000268903 |
chr1:629640-630683/+/MTND2P28 |
chr1:631074-632616/+/MTCO1P12 |
chr1:632757-633438/+/MTCO2P12 |
chr1:633696-634376/+/MTATP6P1 |
chr1:634376-634922/+/MTCO3P12 |
Working on assay exon
chr | start | end | gene_name | strand | |
---|---|---|---|---|---|
chr1:135141-135895/-/ENSG00000268903 | chr1 | 135141 | 135895 | ENSG00000268903 | - |
chr1:629640-630683/+/MTND2P28 | chr1 | 629640 | 630683 | MTND2P28 | + |
chr1:631074-632616/+/MTCO1P12 | chr1 | 631074 | 632616 | MTCO1P12 | + |
chr1:632757-633438/+/MTCO2P12 | chr1 | 632757 | 633438 | MTCO2P12 | + |
chr1:633696-634376/+/MTATP6P1 | chr1 | 633696 | 634376 | MTATP6P1 | + |
chr1:634376-634922/+/MTCO3P12 | chr1 | 634376 | 634922 | MTCO3P12 | + |
# Normalize the data for spatial autocorrelation test
obj <- NormalizeData(obj)
obj <- RunAutoCorr(obj)
Working on assay : exon
Run autocorrelation test for 117043 features.
Runtime : 9.761797 secs
71943 autocorrelated features.
The permutation process can be computationally expensive. In the example below, I set perm=20 to perform only 20 permutations for quicker results. However, the default setting runs 100 permutations for more accurate evaluation. While it’s possible to increase the number of permutations for even more precision, it may not always be necessary. If you’re running Yano for the first time on your dataset, setting perm=20 can help you save time and provide an initial overview of the entire dataset.
Working on assay exon.
Working on binding assay RNA.
Processing 71943 features.
Processing 15000 blocks.
Retrieve binding data from assay RNA.
Use "data" layer for test features and binding features.
Using 7 threads.
Using method "D" with mode 1.
Use predefined weight matrix "pca_wm".
Runtime : 1.279967 mins.
chr | start | end | gene_name | strand | moransi.pval | moransi | autocorr.variable | gene_name.D | gene_name.r | gene_name.pval | gene_name.mean | gene_name.var | gene_name.padj | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
chr1:135141-135895/-/ENSG00000268903 | chr1 | 135141 | 135895 | ENSG00000268903 | - | 0.0000000 | 0.0215770 | TRUE | 0.0002328 | 0.0897475 | 0.9999923 | 0.1243321 | 0.0215897 | 1 |
chr1:629640-630683/+/MTND2P28 | chr1 | 629640 | 630683 | MTND2P28 | + | 0.0000005 | 0.0200792 | TRUE | 0.0016041 | 0.0211544 | 0.9999999 | 0.1370532 | 0.0173937 | 1 |
chr1:631074-632616/+/MTCO1P12 | chr1 | 631074 | 632616 | MTCO1P12 | + | 0.7812383 | -0.0036840 | FALSE | NA | NA | NA | NA | NA | NA |
chr1:632757-633438/+/MTCO2P12 | chr1 | 632757 | 633438 | MTCO2P12 | + | 0.5565309 | -0.0010714 | FALSE | NA | NA | NA | NA | NA | NA |
chr1:633696-634376/+/MTATP6P1 | chr1 | 633696 | 634376 | MTATP6P1 | + | 0.0000000 | 0.3134653 | TRUE | 0.0026167 | 0.0534770 | 0.9998706 | 0.1268494 | 0.0277544 | 1 |
chr1:634376-634922/+/MTCO3P12 | chr1 | 634376 | 634922 | MTCO3P12 | + | 0.8285807 | -0.0042338 | FALSE | NA | NA | NA | NA | NA | NA |
The chromosome names are too long and tend to overlap in the visualization. To resolve this, you can either resize the labels or remove the ‘chr’ prefix from the chromosome names. Additionally, since the Y chromosome and mitochondrial are not of particular interest to us in this analysis, they can be excluded from the visualization.
sel.chrs <- c(1:21, "X")
FbtPlot(obj, val = "gene_name.padj", remove.chr = TRUE, sel.chrs = sel.chrs)
# Let's see how many exons are expressed in different spatial pattern with their genes
obj[['exon']][[]] %>% filter(gene_name.padj < 0.001) %>% knitr::kable()
chr | start | end | gene_name | strand | moransi.pval | moransi | autocorr.variable | gene_name.D | gene_name.r | gene_name.pval | gene_name.mean | gene_name.var | gene_name.padj | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
chr1:66247460-66247654/+/PDE4B | chr1 | 66247460 | 66247654 | PDE4B | + | 0.00e+00 | 0.0548647 | TRUE | 0.2621079 | 0.2731617 | 6e-07 | 0.1283106 | 0.0192591 | 0.0009169 |
chr1:153990763-153991034/+/RPS27 | chr1 | 153990763 | 153991034 | RPS27 | + | 0.00e+00 | 0.1368581 | TRUE | 0.3018767 | 0.0260007 | 0e+00 | 0.1251051 | 0.0207093 | 0.0000783 |
chr1:153990914-153991034/+/RPS27 | chr1 | 153990914 | 153991034 | RPS27 | + | 0.00e+00 | 0.1335295 | TRUE | 0.2996693 | 0.0039468 | 0e+00 | 0.1255460 | 0.0200928 | 0.0000693 |
chr1:154169305-154169383/-/TPM3 | chr1 | 154169305 | 154169383 | TPM3 | - | 0.00e+00 | 0.1122163 | TRUE | 0.3038352 | 0.3313057 | 5e-07 | 0.1318006 | 0.0242857 | 0.0007435 |
chr11:75421727-75422280/+/RPS3 | chr11 | 75421727 | 75422280 | RPS3 | + | 0.00e+00 | 0.0486905 | TRUE | 0.2782288 | 0.0047692 | 4e-07 | 0.1317754 | 0.0205402 | 0.0006932 |
chr11:113262846-113265197/+/NCAM1 | chr11 | 113262846 | 113265197 | NCAM1 | + | 0.00e+00 | 0.3785929 | TRUE | 0.3199220 | -0.0460674 | 1e-07 | 0.1313999 | 0.0236827 | 0.0001967 |
chr12:6943817-6944173/+/C12orf57 | chr12 | 6943817 | 6944173 | C12orf57 | + | 0.00e+00 | 0.0632402 | TRUE | 0.3289584 | -0.0669987 | 2e-07 | 0.1383928 | 0.0253422 | 0.0003642 |
chr12:53303034-53306861/+/ENSG00000288663 | chr12 | 53303034 | 53306861 | ENSG00000288663 | + | 0.00e+00 | 0.0260267 | TRUE | 0.3200202 | -0.2131500 | 0e+00 | 0.1276082 | 0.0160011 | 0.0000006 |
chr12:53306680-53306861/+/ENSG00000288663 | chr12 | 53306680 | 53306861 | ENSG00000288663 | + | 0.00e+00 | 0.0244792 | TRUE | 0.3213900 | -0.1151871 | 0e+00 | 0.1273542 | 0.0176750 | 0.0000026 |
chr12:53306966-53307171/+/ENSG00000288663 | chr12 | 53306966 | 53307171 | ENSG00000288663 | + | 2.40e-06 | 0.0187554 | TRUE | 0.3019310 | -0.0847415 | 0e+00 | 0.1298170 | 0.0172448 | 0.0000098 |
chr12:53306966-53307158/+/ENSG00000288663 | chr12 | 53306966 | 53307158 | ENSG00000288663 | + | 6.00e-07 | 0.0199697 | TRUE | 0.3034912 | -0.0741124 | 0e+00 | 0.1302050 | 0.0177485 | 0.0000132 |
chr12:53306966-53307122/+/ENSG00000288663 | chr12 | 53306966 | 53307122 | ENSG00000288663 | + | 9.85e-05 | 0.0151751 | TRUE | 0.2827103 | -0.1094897 | 0e+00 | 0.1308142 | 0.0175222 | 0.0000693 |
chr12:53306966-53307174/+/ENSG00000288663 | chr12 | 53306966 | 53307174 | ENSG00000288663 | + | 1.70e-06 | 0.0190489 | TRUE | 0.3037096 | -0.0734929 | 0e+00 | 0.1296050 | 0.0169589 | 0.0000068 |
chr12:56161387-56161465/+/MYL6 | chr12 | 56161387 | 56161465 | MYL6 | + | 0.00e+00 | 0.2503614 | TRUE | 0.6383526 | -0.2108757 | 0e+00 | 0.1356362 | 0.0265502 | 0.0000000 |
chr12:79872808-79872938/-/PPP1R12A | chr12 | 79872808 | 79872938 | PPP1R12A | - | 0.00e+00 | 0.0556075 | TRUE | 0.2986613 | -0.0076591 | 1e-07 | 0.1324250 | 0.0201976 | 0.0001272 |
chr15:43801518-43801569/+/SERF2 | chr15 | 43801518 | 43801569 | SERF2 | + | 0.00e+00 | 0.0959150 | TRUE | 0.3082031 | 0.2487588 | 7e-07 | 0.1285504 | 0.0259490 | 0.0009430 |
chr15:43801711-43804427/+/SERF2 | chr15 | 43801711 | 43804427 | SERF2 | + | 0.00e+00 | 0.1171575 | TRUE | 0.3974739 | 0.1772309 | 0e+00 | 0.1297066 | 0.0254541 | 0.0000049 |
chr15:60382342-60384827/-/ANXA2 | chr15 | 60382342 | 60384827 | ANXA2 | - | 0.00e+00 | 0.0547953 | TRUE | 0.2533718 | -0.1135954 | 3e-07 | 0.1397764 | 0.0155887 | 0.0005326 |
chr18:49481681-49482410/-/RPL17-C18orf32 | chr18 | 49481681 | 49482410 | RPL17-C18orf32 | - | 0.00e+00 | 0.2761749 | TRUE | 0.7631057 | 0.0794071 | 0e+00 | 0.1260113 | 0.0160785 | 0.0000000 |
chr18:49482152-49482410/-/RPL17-C18orf32 | chr18 | 49482152 | 49482410 | RPL17-C18orf32 | - | 1.75e-05 | 0.0168981 | TRUE | 0.2449828 | 0.0850250 | 4e-07 | 0.1300614 | 0.0160170 | 0.0006486 |
chr19:16093684-16093753/+/TPM4 | chr19 | 16093684 | 16093753 | TPM4 | + | 0.00e+00 | 0.1004265 | TRUE | 0.3007158 | -0.0920925 | 0e+00 | 0.1310870 | 0.0165675 | 0.0000068 |
chr19:16095264-16095357/+/TPM4 | chr19 | 16095264 | 16095357 | TPM4 | + | 0.00e+00 | 0.1941885 | TRUE | 0.5340904 | 0.0459141 | 0e+00 | 0.1331069 | 0.0199821 | 0.0000000 |
chr19:16095264-16096744/+/TPM4 | chr19 | 16095264 | 16096744 | TPM4 | + | 0.00e+00 | 0.2457950 | TRUE | 0.5685076 | 0.1042584 | 0e+00 | 0.1304218 | 0.0178597 | 0.0000000 |
chr19:16095264-16095454/+/TPM4 | chr19 | 16095264 | 16095454 | TPM4 | + | 0.00e+00 | 0.2415783 | TRUE | 0.5781947 | 0.1136440 | 0e+00 | 0.1328817 | 0.0190551 | 0.0000000 |
chr19:16095264-16095893/+/TPM4 | chr19 | 16095264 | 16095893 | TPM4 | + | 0.00e+00 | 0.2486778 | TRUE | 0.5701326 | 0.1121110 | 0e+00 | 0.1299992 | 0.0177430 | 0.0000000 |
chr19:16095264-16095591/+/TPM4 | chr19 | 16095264 | 16095591 | TPM4 | + | 0.00e+00 | 0.2499397 | TRUE | 0.5724202 | 0.1037189 | 0e+00 | 0.1302904 | 0.0178103 | 0.0000000 |
chr19:16095264-16095496/+/TPM4 | chr19 | 16095264 | 16095496 | TPM4 | + | 0.00e+00 | 0.2563288 | TRUE | 0.5886899 | 0.1312048 | 0e+00 | 0.1301594 | 0.0186750 | 0.0000000 |
chr19:18169087-18170494/+/ENSG00000268173 | chr19 | 18169087 | 18170494 | ENSG00000268173 | + | 0.00e+00 | 0.0508129 | TRUE | 0.4483024 | 0.2271843 | 0e+00 | 0.1346315 | 0.0180476 | 0.0000000 |
chr2:218344808-218346793/+/PNKD | chr2 | 218344808 | 218346793 | PNKD | + | 0.00e+00 | 0.0903808 | TRUE | 0.4729872 | 0.0630592 | 0e+00 | 0.1279734 | 0.0235852 | 0.0000000 |
chr2:218344808-218346784/+/PNKD | chr2 | 218344808 | 218346784 | PNKD | + | 0.00e+00 | 0.0893280 | TRUE | 0.4713933 | 0.0151155 | 0e+00 | 0.1276680 | 0.0236770 | 0.0000000 |
chr2:218344808-218346756/+/PNKD | chr2 | 218344808 | 218346756 | PNKD | + | 0.00e+00 | 0.0890982 | TRUE | 0.4717710 | 0.0077590 | 0e+00 | 0.1276497 | 0.0241184 | 0.0000000 |
chr2:218344808-218346791/+/PNKD | chr2 | 218344808 | 218346791 | PNKD | + | 0.00e+00 | 0.0903808 | TRUE | 0.4729872 | 0.0630592 | 0e+00 | 0.1279734 | 0.0235852 | 0.0000000 |
chr2:218344808-218346771/+/PNKD | chr2 | 218344808 | 218346771 | PNKD | + | 0.00e+00 | 0.0882877 | TRUE | 0.4707117 | 0.0233973 | 0e+00 | 0.1274043 | 0.0239968 | 0.0000000 |
chr3:181175514-181176513/+/SOX2-OT | chr3 | 181175514 | 181176513 | SOX2-OT | + | 0.00e+00 | 0.1538967 | TRUE | 0.3151647 | 0.2429422 | 0e+00 | 0.1308805 | 0.0215551 | 0.0000783 |
chr3:181175514-181176523/+/SOX2-OT | chr3 | 181175514 | 181176523 | SOX2-OT | + | 0.00e+00 | 0.1538967 | TRUE | 0.3151647 | 0.2429422 | 0e+00 | 0.1308805 | 0.0215551 | 0.0000783 |
chr3:181175514-181176524/+/SOX2-OT | chr3 | 181175514 | 181176524 | SOX2-OT | + | 0.00e+00 | 0.1534972 | TRUE | 0.3141340 | 0.2532079 | 0e+00 | 0.1307889 | 0.0217430 | 0.0000912 |
chr5:149549328-149549513/-/CSNK1A1 | chr5 | 149549328 | 149549513 | CSNK1A1 | - | 0.00e+00 | 0.0419463 | TRUE | 0.2923114 | -0.2321349 | 1e-07 | 0.1313981 | 0.0202799 | 0.0002003 |
chr7:25123444-25123849/-/CYCS | chr7 | 25123444 | 25123849 | CYCS | - | 0.00e+00 | 0.0582182 | TRUE | 0.2625515 | -0.0035186 | 6e-07 | 0.1313296 | 0.0187426 | 0.0008400 |
chr7:104961984-104963216/+/KMT2E | chr7 | 104961984 | 104963216 | KMT2E | + | 3.50e-06 | 0.0179554 | TRUE | 0.2564779 | 0.2039581 | 6e-07 | 0.1303578 | 0.0179934 | 0.0008400 |
chr8:56067269-56068344/-/RPS20 | chr8 | 56067269 | 56068344 | RPS20 | - | 0.00e+00 | 0.0249181 | TRUE | 0.2801314 | 0.0894321 | 2e-07 | 0.1317068 | 0.0193942 | 0.0002907 |
chr8:56067295-56068530/-/RPS20 | chr8 | 56067295 | 56068530 | RPS20 | - | 0.00e+00 | 0.0266165 | TRUE | 0.2852237 | 0.0729618 | 1e-07 | 0.1316712 | 0.0192429 | 0.0001956 |
chr8:60620674-60620895/+/RAB2A | chr8 | 60620674 | 60620895 | RAB2A | + | 0.00e+00 | 0.0224659 | TRUE | 0.2985668 | 0.1232095 | 2e-07 | 0.1367841 | 0.0215991 | 0.0003763 |
chr8:60620674-60620938/+/RAB2A | chr8 | 60620674 | 60620938 | RAB2A | + | 0.00e+00 | 0.0262698 | TRUE | 0.3053199 | 0.0488592 | 1e-07 | 0.1349146 | 0.0220615 | 0.0002733 |
chr8:60620674-60620948/+/RAB2A | chr8 | 60620674 | 60620948 | RAB2A | + | 0.00e+00 | 0.0261589 | TRUE | 0.3045068 | 0.0533409 | 2e-07 | 0.1348457 | 0.0221765 | 0.0002907 |
chr8:60620674-60620941/+/RAB2A | chr8 | 60620674 | 60620941 | RAB2A | + | 0.00e+00 | 0.0261589 | TRUE | 0.3045068 | 0.0533409 | 2e-07 | 0.1348457 | 0.0221765 | 0.0002907 |
chr8:60620674-60620988/+/RAB2A | chr8 | 60620674 | 60620988 | RAB2A | + | 0.00e+00 | 0.0259009 | TRUE | 0.3048041 | 0.0518757 | 1e-07 | 0.1339123 | 0.0216852 | 0.0002092 |
chrX:119583127-119583347/+/UBE2A | chrX | 119583127 | 119583347 | UBE2A | + | 0.00e+00 | 0.0392251 | TRUE | 0.2987353 | -0.1300155 | 1e-07 | 0.1383466 | 0.0202844 | 0.0002048 |
chrX:119583127-119584101/+/UBE2A | chrX | 119583127 | 119584101 | UBE2A | + | 0.00e+00 | 0.0360517 | TRUE | 0.2840007 | -0.1164555 | 3e-07 | 0.1391136 | 0.0195684 | 0.0004340 |
chrX:119583127-119583417/+/UBE2A | chrX | 119583127 | 119583417 | UBE2A | + | 0.00e+00 | 0.0424595 | TRUE | 0.3005669 | -0.1607102 | 0e+00 | 0.1388922 | 0.0169531 | 0.0000184 |
chrX:119583127-119583621/+/UBE2A | chrX | 119583127 | 119583621 | UBE2A | + | 0.00e+00 | 0.0408935 | TRUE | 0.2958513 | -0.1531455 | 0e+00 | 0.1391206 | 0.0166949 | 0.0000225 |
chrX:151404916-151405157/+/VMA21 | chrX | 151404916 | 151405157 | VMA21 | + | 0.00e+00 | 0.0540251 | TRUE | 0.3411674 | -0.3238006 | 0e+00 | 0.1355843 | 0.0228122 | 0.0000410 |
# Random select a gene and its exons and visulize with FeaturePlot.
FeaturePlot(obj, features = c("chr15:43801711-43804427/+/SERF2", "SERF2"), ncol=2)
# The default color and parameters perhaps not easily to tell the difference between exon and its binding gene expression. Let's change the scaled colors and enlarge point size and order by expression.
require(RColorBrewer)
Loading required package: RColorBrewer
FeaturePlot(obj, features = c("chr15:43801711-43804427/+/SERF2", "SERF2"), ncol=2, order = TRUE, pt.size=1) & scale_colour_gradientn(colours = rev(brewer.pal(n = 11, name = "RdBu")))
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
We can also map the ratio of exon expression to gene expression on the UMAP. The RatioPlot
function is designed for this purpose. As observed, the gene SERF2 is relatively low expressed in groups 0, 5, and 9, while the ratio of the exon chr15:43801711-43804427/+/SERF2 is higher in these groups.
RatioPlot(obj, features = c("chr15:43801711-43804427/+/SERF2"), assay = 'exon', bind.assay = 'RNA', bind.name = "gene_name", order = TRUE, pt.size=1)
In the feature plot and ratio plot above, the exon appears to lack a strong expression pattern across cell groups, whereas the gene SERF2 seems to be highly expressed in many groups, except for groups 0, 5 and 9. This inconsistent expression pattern between the exon and its corresponding gene may suggest differential exon usage. To explore the coverage details of both the exon and the gene body, we will generate a track plot next.
In our package, retrieving gene locations requires loading a GTF file instead of relying on current Bioconductor databases, such as org.Hs.eg.db
. This is due to the varying versions of gene annotations provided by different institutes, which can introduce inconsistencies. To avoid potential bias during preprocessing and postprocessing, we strongly recommend using the same GTF file consistently throughout your project. The Yano package includes the gtf2db
function, which enables you to load a GTF file into memory for further analysis.
[2025-02-20 18:15:04] GTF loading..
[2025-02-20 18:15:16] Load 62700 genes.
A track plot is used to study the read coverage per cell group. In the track plot shown below, the cell group is specified by the cell.group
parameter. Unlike IGV, where read depth is used, we use UMI depth in this plot. The cell barcode tag and UMI tag are predefined as “CB” and “UB” with parameter cell.tag
and umi.tag
. For each cell group, the UMI depth has been normalized by the number of cells in that group. This means that the depth at each location can be interpreted as the mean UMI depth per cell for the group. As a result, the tracks are directly comparable across different cell groups. If cell.group
is not set, the track plot will generate the raw UMI depth per location.
TrackPlot(bamfile = "Parent_SC3v3_Human_Glioblastoma_possorted_genome_bam.bam", gtf = gtf, gene = "SERF2", cell.group = Idents(obj), highlights = c(43801711,43804427) )
In the track plot, we can easily observe that an exon around position 43,794,000 dominates the expression of the SERF2 gene and is highly expressed in many cell groups. However, the exon ‘chr15:43801711-43804427/+/SERF2’ (highlighted) shows low expression and is not visible in the track plot. To visualize low-expressed exons, we can set the max.depth
parameter to 2, which caps the UMI depth at 2. And many genes in the region, we set display.genes
to SERF2 only. This adjustment allows the low-expressed exons and their related transcripts to be more clearly represented in the plot. In this case, we can found the the highlighed exon shows different expressed pattern with the gene SERF2.
In addition to exon expression, junction expression can provide insights into different expression patterns across transcripts, offering a complementary perspective. Junction expression refers to the UMI counts of reads that span more than one exon. It’s important to note that junctions are named similarly to exons, but the start and end positions are different. The start of the junction corresponds to the end of the previous exon, while the end of the junction represents the start of the next exon.
junction <- ReadPISA("./junction/")
obj[['junction']] <- CreateAssayObject(junction[, colnames(obj)], min.cells=20)
DefaultAssay(obj) <- "junction"
obj <- NormalizeData(obj)
# select spatial autocorrelated junctions
obj <- RunAutoCorr(obj)
Working on assay : junction
Run autocorrelation test for 14786 features.
Runtime : 1.228947 secs
7697 autocorrelated features.
Working on assay junction
# perform dissimilarity test between junctions and their binding genes
obj <- RunBlockCorr(obj, bind.name = "gene_name", bind.assay = "RNA", perm=20)
Working on assay junction.
Working on binding assay RNA.
Processing 7697 features.
Processing 3838 blocks.
Retrieve binding data from assay RNA.
Use "data" layer for test features and binding features.
Using 7 threads.
Using method "D" with mode 1.
Use predefined weight matrix "pca_wm".
Runtime : 7.487403 secs.
moransi.pval moransi
chr12:53306861-53306966/+/ENSG00000288663 4.801453e-24 0.04183484
chr12:56160320-56161387/+/MYL6 0.000000e+00 0.19062814
chr19:16093753-16095264/+/TPM4 0.000000e+00 0.19011627
autocorr.variable chr start
chr12:53306861-53306966/+/ENSG00000288663 TRUE chr12 53306861
chr12:56160320-56161387/+/MYL6 TRUE chr12 56160320
chr19:16093753-16095264/+/TPM4 TRUE chr19 16093753
end gene_name strand
chr12:53306861-53306966/+/ENSG00000288663 53306966 ENSG00000288663 +
chr12:56160320-56161387/+/MYL6 56161387 MYL6 +
chr19:16093753-16095264/+/TPM4 16095264 TPM4 +
gene_name.D gene_name.r
chr12:53306861-53306966/+/ENSG00000288663 0.3262212 -0.14110963
chr12:56160320-56161387/+/MYL6 0.4844907 -0.03427538
chr19:16093753-16095264/+/TPM4 0.5162236 0.09774324
gene_name.pval gene_name.mean
chr12:53306861-53306966/+/ENSG00000288663 8.814720e-13 0.1290557
chr12:56160320-56161387/+/MYL6 4.848989e-11 0.1395560
chr19:16093753-16095264/+/TPM4 2.125092e-14 0.1316502
gene_name.var gene_name.padj
chr12:53306861-53306966/+/ENSG00000288663 0.01232423 3.392345e-09
chr12:56160320-56161387/+/MYL6 0.02712959 1.244089e-07
chr19:16093753-16095264/+/TPM4 0.01953657 1.635683e-10
# Because both exon and junction are compared with gene, so it's reasonable to combine these two assays in one plot
FbtPlot(obj, val="gene_name.padj", assay = c("exon", "junction"), col.by = "assay", shape.by = "assay", pt.size = 2, remove.chr = TRUE, sel.chrs = sel.chrs, cols = c("red", "blue"))
# We can find there is an exon and a junction at chromosome 12 with very low p value (<1e-8), let's see which gene they are located
obj[['exon']][[]] %>% filter(chr == "chr12" & gene_name.pval < 1e-8) %>% knitr::kable()
chr | start | end | gene_name | strand | moransi.pval | moransi | autocorr.variable | gene_name.D | gene_name.r | gene_name.pval | gene_name.mean | gene_name.var | gene_name.padj | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
chr12:53303034-53306861/+/ENSG00000288663 | chr12 | 53303034 | 53306861 | ENSG00000288663 | + | 0.0e+00 | 0.0260267 | TRUE | 0.3200202 | -0.2131500 | 0 | 0.1276082 | 0.0160011 | 6.00e-07 |
chr12:53306680-53306861/+/ENSG00000288663 | chr12 | 53306680 | 53306861 | ENSG00000288663 | + | 0.0e+00 | 0.0244792 | TRUE | 0.3213900 | -0.1151871 | 0 | 0.1273542 | 0.0176750 | 2.60e-06 |
chr12:53306966-53307171/+/ENSG00000288663 | chr12 | 53306966 | 53307171 | ENSG00000288663 | + | 2.4e-06 | 0.0187554 | TRUE | 0.3019310 | -0.0847415 | 0 | 0.1298170 | 0.0172448 | 9.80e-06 |
chr12:53306966-53307158/+/ENSG00000288663 | chr12 | 53306966 | 53307158 | ENSG00000288663 | + | 6.0e-07 | 0.0199697 | TRUE | 0.3034912 | -0.0741124 | 0 | 0.1302050 | 0.0177485 | 1.32e-05 |
chr12:53306966-53307174/+/ENSG00000288663 | chr12 | 53306966 | 53307174 | ENSG00000288663 | + | 1.7e-06 | 0.0190489 | TRUE | 0.3037096 | -0.0734929 | 0 | 0.1296050 | 0.0169589 | 6.80e-06 |
chr12:56161387-56161465/+/MYL6 | chr12 | 56161387 | 56161465 | MYL6 | + | 0.0e+00 | 0.2503614 | TRUE | 0.6383526 | -0.2108757 | 0 | 0.1356362 | 0.0265502 | 0.00e+00 |
moransi.pval | moransi | autocorr.variable | chr | start | end | gene_name | strand | gene_name.D | gene_name.r | gene_name.pval | gene_name.mean | gene_name.var | gene_name.padj | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
chr12:53306861-53306966/+/ENSG00000288663 | 0 | 0.0418348 | TRUE | chr12 | 53306861 | 53306966 | ENSG00000288663 | + | 0.3262212 | -0.1411096 | 0 | 0.1290557 | 0.0123242 | 0e+00 |
chr12:56160320-56161387/+/MYL6 | 0 | 0.1906281 | TRUE | chr12 | 56160320 | 56161387 | MYL6 | + | 0.4844907 | -0.0342754 | 0 | 0.1395560 | 0.0271296 | 1e-07 |
FeaturePlot(obj, features = c("chr12:56161387-56161465/+/MYL6","chr12:56160320-56161387/+/MYL6", "MYL6"), order = TRUE, pt.size = 2, ncol=3) & scale_colour_gradientn(colours = rev(brewer.pal(n = 11, name = "RdBu")))
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
# We could also plot the expression ratio of these exons or junctions on umap
p1 <- RatioPlot(obj, assay = "exon", bind.assay = "RNA", bind.name = "gene_name", features = "chr12:56161387-56161465/+/MYL6")
p2 <- RatioPlot(obj, assay = "exon", bind.assay = "RNA", bind.name = "gene_name", features = "chr12:56160626-56160670/+/MYL6")
p3 <- RatioPlot(obj, assay = "junction", bind.assay = "RNA", bind.name = "gene_name", features = "chr12:56160320-56161387/+/MYL6")
cowplot::plot_grid(p1,p2,p3, ncol=3)
We then visualize the track plot for this gene, including junction reads by setting junc=TRUE
. The height of the splice paths in the plot represents the expression level of each junction within the specified cell group.
TrackPlot(bamfile = "Parent_SC3v3_Human_Glioblastoma_possorted_genome_bam.bam", gtf = gtf, gene = "MYL6", cell.group = Idents(obj), junc = TRUE, highlights = list(c(56160320,56161387),c(56161387,56161465)))
You might be wondering why the exon chr12:56161387-56161465/+/MYL6 appears highly expressed in cell group 2 in the track plot, where the overlapping peak is clearly higher than in other groups, but its expression level in the feature plot is not as high as expected.
This discrepancy arises because the exon is overlapping with other exons from different transcripts. We only count reads that are fully contained within the exon as part of the exon’s expression. Therefore, reads that partially overlap with this exon are not included in the count.
In contrast, the overlapping exon chr12:56161387-56161575/+/MYL6 shows higher expression in group 2 compared to other groups. It’s important to note that if a read is fully contained within two or more overlapping exons, PISA will count it for all relevant exons. Check PISA’s manual for details.
p1 <- DimPlot(obj, label=TRUE, label.size = 5, label.box = TRUE)
p2 <- FeaturePlot(obj, features = c("chr12:56161387-56161575/+/MYL6"), order = TRUE, pt.size = 1) & scale_colour_gradientn(colours = rev(brewer.pal(n = 11, name = "RdBu")))
Warning: Could not find chr12:56161387-56161575/+/MYL6 in the default search
locations, found in 'exon' assay instead
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
The spatial dissimilarity test method prioritizes alternatively spliced exons and junctions across all cells but does not identify which specific cell groups exhibit these splicing events. To address this, let’s manually extract the scaled expression data for the selected alternatively spliced exons and their corresponding genes, then perform a co-clustering analysis. A comprehensive heatmap will be generated using the ComplexHeatmap package, providing a visual representation of the exon and gene distribution across cell groups.
obj[['exon']][[]] %>% filter(gene_name.padj<0.001) %>% rownames -> exons
obj[['exon']][[]] %>% filter(gene_name.padj<0.001) %>% pull(gene_name) -> bind.genes
DefaultAssay(obj) <- "RNA"
obj <- ScaleData(obj, features = unique(bind.genes))
DefaultAssay(obj) <- "exon"
obj <- ScaleData(obj, features = exons)
dat1 <- GetAssayData(obj, assay = 'exon', layer = 'scale.data')
dat2 <- GetAssayData(obj, assay = 'RNA', layer = 'scale.data')
idents <- sort(Idents(obj))
order.cells <- names(idents)
dat2 <- dat2[bind.genes,]
rownames(dat2) <- exons
dat <- cbind(dat1, dat2)
require(ComplexHeatmap)
d <- dist(dat)
hc <- hclust(d)
idx <- hc$labels[hc$order]
ha <- HeatmapAnnotation(group=idents, border = TRUE)
ht1 <- Heatmap(dat1[idx, order.cells], cluster_rows = FALSE, cluster_columns = FALSE, show_column_names = FALSE, border = TRUE, top_annotation = ha, name = "exon", column_title = "exon")
ht2 <- Heatmap(dat2[idx, order.cells], cluster_rows = FALSE, cluster_columns = FALSE, show_column_names = FALSE, border = TRUE, top_annotation = ha, name = "gene", column_title = "gene", row_names_max_width = max_text_width(rownames(dat2), gp = gpar(fontsize = 12)))
ht <- ht1 + ht2
draw(ht, heatmap_legend_side = "left", annotation_legend_side = "left")
In the previous sections, we conducted a spatial dissimilarity test between exon/junction expression and gene expression. However, binding features are not always limited to genes; they can also correspond to other types of features. In this section, we perform a test between exon expression and reads that skip this exon (exclude assay). This approach is similar to the Percent Spliced In (PSI) method, which is widely used to analyze alternative splicing in both bulk and single-cell RNA-seq data. The PSI is calculated as: PSI = exon reads / (exon reads + reads skipping this exon)
.
# The reads that skip exons are annotated using the `-psi` option in PISA anno, and these counts are stored in the `exclude` directory. We then load these excluded counts into a new assay.
exclude <- ReadPISA("exclude/")
obj[['exclude']] <- CreateAssayObject(exclude[,colnames(obj)], min.cells = 10)
DefaultAssay(obj) <- "exclude"
# Normalize counts for exon-excluded reads
obj <- NormalizeData(obj)
# Then we switch to exon assay
DefaultAssay(obj) <- "exon"
# Because the feature names in the exclude assay are exactly the same as those in the exon assay, they represent the reads that skip each corresponding exon. Therefore, we set up the binding feature using the exon name itself.
obj[['exon']][['exon_name']] <- rownames(obj)
obj[['exon']][['exon_name']] %>% head
exon_name
chr1:135141-135895/-/ENSG00000268903 chr1:135141-135895/-/ENSG00000268903
chr1:629640-630683/+/MTND2P28 chr1:629640-630683/+/MTND2P28
chr1:631074-632616/+/MTCO1P12 chr1:631074-632616/+/MTCO1P12
chr1:632757-633438/+/MTCO2P12 chr1:632757-633438/+/MTCO2P12
chr1:633696-634376/+/MTATP6P1 chr1:633696-634376/+/MTATP6P1
chr1:634376-634922/+/MTCO3P12 chr1:634376-634922/+/MTCO3P12
# Then we perform spatial dissimilarity test between exon and exclude, mode 1
obj <- RunBlockCorr(obj, bind.name = "exon_name", bind.assay = "exclude")
# Swith to exon exluded assay
DefaultAssay(obj) <- "exclude"
obj <- RunAutoCorr(obj)
obj <- SetAutoCorrFeatures(obj)
obj <- ParseExonName(obj)
obj[['exclude']][['exon_name']] <- rownames(obj)
obj <- RunBlockCorr(obj, bind.name = "exon_name", bind.assay = "exon")
FbtPlot(obj, val = "exon_name.padj", remove.chr = TRUE, assay = c("exclude", "exon"), shape.by = "assay", col.by = "assay", cols = c("yellow", "green"), pt.size = 2)
# Let's how many exons can be prioritized by both exon assay and exclude assay
obj[['exclude']][[]] %>% filter(exon_name.padj<1e-5) %>% rownames -> sel1
obj[['exon']][[]] %>% filter(exon_name.padj<1e-5) %>% rownames -> sel2
intersect(sel1,sel2)
[1] "chr1:19342752-19342864/-/CAPZB" "chr1:153990914-153991034/+/RPS27"
[3] "chr10:7806974-7807010/+/ATP5F1C" "chr12:56160626-56160670/+/MYL6"
[5] "chr12:56160626-56160945/+/MYL6" "chr13:28670669-28672120/+/POMP"
[7] "chr19:16095264-16095357/+/TPM4" "chr19:16095264-16095454/+/TPM4"
[9] "chr19:16095264-16095496/+/TPM4" "chr19:16095264-16095893/+/TPM4"
[11] "chr19:16095264-16096744/+/TPM4" "chr19:16095264-16095591/+/TPM4"
[13] "chr2:197490527-197490664/-/HSPD1" "chr20:37238402-37238449/+/RPN2"
[15] "chr20:49280903-49280964/+/ZFAS1" "chr3:197953488-197953660/+/RPL35A"
[17] "chr4:82425563-82425667/-/HNRNPDL" "chr5:83519349-83522309/+/VCAN"
[19] "chr9:35684732-35684807/-/TPM2" "chr9:35684732-35684802/-/TPM2"
[21] "chrX:154400464-154400626/+/RPL10"
DefaultAssay(obj) <- "exclude"
p1 <- FeaturePlot(obj, features = c("chr19:16095264-16095357/+/TPM4"),order = TRUE)
DefaultAssay(obj) <- "exon"
p2 <- FeaturePlot(obj, features = c("chr19:16095264-16095357/+/TPM4"),order = TRUE)
p3 <- PSIPlot(obj, exon.assay = "exon", exclude.assay = "exclude", features = c("chr19:16095264-16095357/+/TPM4"),order = TRUE)
cowplot::plot_grid(p1,p2,p3, ncol=3)
TrackPlot(bamfile = "Parent_SC3v3_Human_Glioblastoma_possorted_genome_bam.bam", gtf = gtf, gene = "TPM4", cell.group = Idents(obj), highlights = c(16095264,16095357), junc = TRUE, max.depth = 1)
In previous sections, we noted that exon or junction expression is part of gene expression, and inverse expression patterns can strongly indicate alternative splicing. However, exon-skipped reads are largely independent of exon expression, making them more sensitive for detecting alternative splicing and allowing for the prioritization of many events.
Because our spatial dissimilarity test does not account for the spatial dependency of the binding feature, numerous events may be prioritized, especially when the binding feature is sparsely expressed. While some of these events may be true, others might arise due to low coverage. To enhance detection power and reduce potential false positives, intersecting the prioritized exons with the prioritized exon-excluded features can help refine the results.
Mode 3 can be used as an alternative to mode 1 to specifically detect events with strong inverse expression patterns. In mode 3, the exon reads and exon-excluded reads are summed as the binding assay, allowing for a more targeted analysis of such patterns. It is important to note that events detected with mode 3 are always detectable with mode 1, making mode 3 a refined approach for prioritizing inverse expression events.
DefaultAssay(obj) <- "exclude"
obj <- RunBlockCorr(obj, bind.name = "exon_name", bind.assay = "exon", mode = 3, prefix = "mode3")
DefaultAssay(obj) <- "exon"
obj <- RunBlockCorr(obj, bind.name = "exon_name", bind.assay = "exclude", mode = 3, prefix = "mode3")
FbtPlot(obj, val = "mode3.padj", remove.chr = TRUE, assay = c("exclude", "exon"), shape.by = "assay", col.by = "assay", pt.size = 2, cols=c("yellow", "green"))
In this case study, we performed a spatial dissimilarity test between various feature pairs. This method provides an overview of the entire cell population and does not rely on prior cell clustering and annotation, making it a powerful tool for analyzing cell data without any prior knowledge. It is recommended to test different features including junctions and exons, and set up different with their corresponding genes in 3’ or 5’ biased scRNA-seq. Additionally, testing between exon-included and exon-skipped reads have more power to detect exon excluded events.
To obtain cell-cluster-specific expression patterns, applying heatmaps and clustering in subsequent analyses is recommended.
If you have any questions regarding this vignette and the usage of Yano, please feel free to report them through the discussion forum. When submitting your query, please ensure you attach the commands you used for better clarity and support.
[1] "NormalizeData.RNA" "FindVariableFeatures.RNA"
[3] "RunPCA.RNA" "FindNeighbors.RNA.pca"
[5] "FindClusters" "RunUMAP.RNA.pca"
[7] "ParseExonName.exon" "NormalizeData.exon"
[9] "RunAutoCorr.exon.pca" "SetAutoCorrFeatures.exon"
[11] "NormalizeData.junction" "RunAutoCorr.junction.pca"
[13] "SetAutoCorrFeatures.junction" "ParseExonName.junction"
[15] "RunBlockCorr.junction.pca" "ScaleData.RNA"
[17] "ScaleData.exon" "NormalizeData.exclude"
[19] "RunAutoCorr.exclude.pca" "SetAutoCorrFeatures.exclude"
[21] "ParseExonName.exclude" "RunBlockCorr.exclude.pca"
[23] "RunBlockCorr.exon.pca"