Spatial dissimilarity analysis for single-cell trajectories and supervised embeddings
In the previous vignette, we performed cell clustering using unsupervised methods. Typically, a PCA or integrated space (e.g., Harmony) is generated during this process, and we then calculate the cell weight matrix directly from the cell embedding space. This approach is convenient for running the pipeline, and provide the general result in all cell population. Furthermore, for specific signaling or biological pathways, it is often more informative to focus on a ‘supervised’ cell ordering. This vignette demonstrates how to perform spatial dissimilarity test in low-dimensional spaces, such as cell trajectory.
Prepare raw data
We used the human testis single-cell atlas dataset for this analysis. The full dataset is publicly available at GSE112013. For this vignette, we only used replicate 1.
## We first download the BAM files and GTF filewget-c https://sra-pub-src-1.s3.amazonaws.com/SRR6860519/14515X1.bam.1 -O donor1_rep1.bamwget-c https://sra-pub-src-1.s3.amazonaws.com/SRR6860520/14606X1.bam.1 -O donor1_rep2.bam## Please Note: we use a different annotation file from previos vignette, because the BAM file use NCBI style chromosome name here.wget https://ftp.ensembl.org/pub/release-114/gtf/homo_sapiens/Homo_sapiens.GRCh38.114.chr.gtf.gz## Index the bam files for track plotssamtools index donor1_rep1.bamsamtools index donor1_rep2.bam## Then we use PISA to generate feature countsPISA anno -gtf Homo_sapiens.GRCh38.114.chr.gtf.gz -exon-psi-o d1_rep2_anno.bam donor1_rep2.bamPISA anno -gtf Homo_sapiens.GRCh38.114.chr.gtf.gz -exon-psi-o d1_rep1_anno.bam donor1_rep1.bammkdir exp1mkdir exp2mkdir exon1mkdir exon2mkdir junction1mkdir junction2mkdir exclude1mkdir exclude2PISA count -tag CB -umi UB -anno-tag GN -outdir exp1 d1_rep1_anno.bamPISA count -tag CB -umi UB -anno-tag GN -outdir exp2 d1_rep2_anno.bamPISA count -tag CB -umi UB -anno-tag EX -outdir exon1 d1_rep1_anno.bamPISA count -tag CB -umi UB -anno-tag EX -outdir exon2 d1_rep2_anno.bamPISA count -tag CB -umi UB -anno-tag JC -outdir junction1 d1_rep1_anno.bamPISA count -tag CB -umi UB -anno-tag JC -outdir junction2 d1_rep2_anno.bam# Note: Although we generate the excluded reads here, we do not use them in this demonstration for the sake of simplicity. However, it is recommended to test them in real cases, so I’ve kept the data available for you to use in your own analyses.PISA count -tag CB -umi UB -anno-tag ER -outdir exclude1 d1_rep1_anno.bamPISA count -tag CB -umi UB -anno-tag ER -outdir exclude2 d1_rep2_anno.bam
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 4400
Number of edges: 134553
Running Louvain algorithm...
Maximum modularity in 10 random starts: 0.9639
Number of communities: 13
Elapsed time: 0 seconds
Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session
Warning: The `slot` argument of `FetchData()` is deprecated as of SeuratObject 5.0.0.
ℹ Please use the `layer` argument instead.
ℹ The deprecated feature was likely used in the Seurat package.
Please report the issue at <https://github.com/satijalab/seurat/issues>.
We now select germline cells from the dataset and construct the cell development trajectory using a principal curve. Although there are many well-documented methods for constructing linear trajectories, here we use a straightforward and perhaps the simplest approach purely for demonstration purposes.
Warning: The `slot` argument of `SetAssayData()` is deprecated as of SeuratObject 5.0.0.
ℹ Please use the `layer` argument instead.
ℹ The deprecated feature was likely used in the Seurat package.
Please report the issue at <https://github.com/satijalab/seurat/issues>.
Warning: The `slot` argument of `GetAssayData()` is deprecated as of SeuratObject 5.0.0.
ℹ Please use the `layer` argument instead.
ℹ The deprecated feature was likely used in the Seurat package.
Please report the issue at <https://github.com/satijalab/seurat/issues>.
obj.sel <-ParseExonName(obj.sel)
When setup order cells, the weight matrix name will be named to “trajectory_wm”.
FeaturePlot(obj.sel, features =c("3:52413996-52414048/+/PHF7", "3:52414048-52414496/+/PHF7", "PHF7"), ncol=3, order=TRUE)
Warning: Could not find 3:52413996-52414048/+/PHF7 in the default search
locations, found in 'exon' assay instead
Warning: Could not find PHF7 in the default search locations, found in 'RNA'
assay instead
Perform alternative splicing analysis for all cell space
To use all cell space, we do not need to specify a name for the weight matrix here, because it will default to using the PCA reduction map to generate the weight matrix. The resulting matrix will be named “reduction_wm”.
Now we have two weight matrices. If no name is specified in RunSDT(), the first matrix will always be used by default, so we need to explicitly specify “pca_wm” to avoid misusing the weight matrix here.
Use "data" layer for test features and binding features.
Using 7 threads.
Runtime : 1.231873 mins.
By comparing the use of cell trajectory ordering with the PCA space, we found that using the PCA approach can detect more alternatively expressed genes. This is because forcing all cells into a 1D trajectory ordering preserves only the rank of each cell, but loses much of the neighborhood information about cells in the broader context. Since cell trajectories are typically constructed based on cell–cell distances in high-dimensional space, regardless of the algorithm used to build the trajectory, I personally recommend using the cell space directly rather than relying solely on 1D cell ordering, unless you have specific reasons or detailed knowledge.
FbtPlot(obj.sel, val ="all.padj", assay =c("exon", "junction"), shape.by ="assay", color.by ="assay")
Perform alternative splicing on a supervised two dimension space
We can also perform the analysis using a supervised two-dimensional cell space. As an example, we utilize Seurat’s built-in cell cycle scoring scheme for demonstration purposes.