• Overview
  • FASTQ+
  • PISA
  • Yano
  • My GitHub Page
  • Discussions
  1. Workflows & Short cases
  2. Allele-specific gene expression analysis
  • Workflows & Short cases
    • From FASTQ to counts
    • Annotate various features
    • Select cells from reduction maps
    • Alternative splicing analysis
    • Allele-specific gene expression analysis
    • Annotating genetic variants
    • Analysis multiple Visium samples
    • Cell trajectory analysis

On this page

  • 0. Perform cell clustering with Seurat
  • 1. Perform allele-specific gene expression analysis by using SNV anchors
  1. Workflows & Short cases
  2. Allele-specific gene expression analysis

Alelle specific gene expression analysis for single cell RNA sequencing

This vignette uses gene and single nucleotide variant (SNV) expression files generated from the Annotate Various Features for Alignment.

0. Perform cell clustering with Seurat

We begin by performing cell clustering based on gene expression, which is identical to the first step outlined in the Alternative splicing analysis for scRNA-seq. If you’re continuing from the previous vignette, you are free to skip this section.

require(Yano)
Loading required package: Yano
── Attaching packages ────────────────────────────────────────────── Yano 1.2 ──
✔ dplyr   1.1.4     ✔ Seurat  5.3.0
✔ ggplot2 4.0.0     
exp <- ReadPISA("./exp/")

obj <- CreateSeuratObject(exp, min.features = 1000, min.cells = 10)
obj[["percent.mt"]] <- PercentageFeatureSet(obj, pattern = "^MT-")
obj <- subset(obj, nFeature_RNA < 9000 & percent.mt < 20)

# Downsampling to 2000 cells for fast testing
obj <- obj[, sample(colnames(obj),2000)]

# To improve visualization, we reduced the resolution when identifying cell clusters.
obj <- NormalizeData(obj) %>% FindVariableFeatures() %>% ScaleData() %>%  RunPCA(verbose=FALSE) %>% FindNeighbors(dims = 1:10, verbose=FALSE) %>% FindClusters(resolution = 0.1, verbose=FALSE) %>% RunUMAP(dims=1:10, verbose=FALSE)
Normalizing layer: counts
Finding variable features for layer counts
Centering and scaling data matrix
DimPlot(obj, label=TRUE, label.size = 5, label.box = TRUE)

1. Perform allele-specific gene expression analysis by using SNV anchors

In this vignette, when a genetic variant is detected, all possible alleles at that location, including both alternative alleles and the reference allele, are annotated and counted. Therefore, genetic variants refer to both alternative alleles and the reference allele in this context.

var <- ReadPISA("./var")
obj[['var']] <- CreateAssayObject(var[,colnames(obj)], min.cells = 10)
# Switch working assay to 'var'
DefaultAssay(obj) <- "var"

obj <- NormalizeData(obj)
Warning: The `slot` argument of `SetAssayData()` is deprecated as of SeuratObject 5.0.0.
ℹ Please use the `layer` argument instead.
ℹ The deprecated feature was likely used in the Seurat package.
  Please report the issue at <https://github.com/satijalab/seurat/issues>.
Warning: The `slot` argument of `GetAssayData()` is deprecated as of SeuratObject 5.0.0.
ℹ Please use the `layer` argument instead.
ℹ The deprecated feature was likely used in the Seurat package.
  Please report the issue at <https://github.com/satijalab/seurat/issues>.
# you can skip to read GTF again if you already load it
gtf <- gtf2db("./gencode.v44.annotation.gtf.gz")
[2026-01-13 12:57:50] GTF loading..
[2026-01-13 12:58:34] Load 62700 genes.
[2026-01-13 12:58:34] Load time : 43.905 sec
# At this moment the meta infomation for genetic variant assay is still empty
obj[['var']][[]] %>% head
data frame with 0 columns and 6 rows
obj <- annoVAR(obj, gtf = gtf)
[warnings] Chromosome KI270442.1 not found in GTF, use wrong database? 
[warnings] Chromosome KI270729.1 not found in GTF, use wrong database? 
[warnings] Chromosome GL000205.2 not found in GTF, use wrong database? 
[warnings] Chromosome GL000195.1 not found in GTF, use wrong database? 
[warnings] Chromosome KI270733.1 not found in GTF, use wrong database? 
[warnings] Chromosome GL000219.1 not found in GTF, use wrong database? 
[warnings] Chromosome GL000216.2 not found in GTF, use wrong database? 
[warnings] Chromosome GL000218.1 not found in GTF, use wrong database? 
[warnings] Chromosome KI270713.1 not found in GTF, use wrong database? 
# After annotation, the locations are parsed from the feature name, and overlapped gene are also annotated.
obj[['var']][[]] %>% head %>% knitr::kable()
chr start ref alt strand locus gene_name type
chr1:633192G=/+ chr1 633192 G G + chr1:633192/+ MTCO2P12 utr5
chr1:634337T>C/+ chr1 634337 T C + chr1:634337/+ MTATP6P1 utr5
chr1:853876T>C/+ chr1 853876 T C + chr1:853876/+ LINC01128 utr5
chr1:944296G>A/- chr1 944296 G A - chr1:944296/- NOC2L utr3
chr1:944296G>A/+ chr1 944296 G A + chr1:944296/+ SAMD11 utr5
chr1:944307T>C/+ chr1 944307 T C + chr1:944307/+ SAMD11 utr5
# Select autocorrlated genetic variants 
obj <- RunAutoCorr(obj)

In the alternative splicing analysis, we use mode 1 (the default mode). However, for allele-specific gene expression, we switch to mode 2 to compare the expression of an SNV with other SNVs at the same location. Since no binding assay is specified, the records at the same locus are first aggregated, and then the test feature is subtracted from the aggregated data (mode 2).

obj <- RunSDT(object = obj, bind.name = "locus", mode = 2)
Working on assay var.
Use predefined weight matrix "pca_wm".
Processing 13362 features.
No bind.assay specified, update min.features.per.block to 2.
Processing 4097 binding features.
Use "counts" layer for test features.
Aggregate counts for binding features.
Using 63 threads.
Runtime : 9.111135 secs.
sel.chrs <- c(1:21, "X")
FbtPlot(obj, val = "locus.padj", remove.chr = TRUE, sel.chrs = sel.chrs)
Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.
ℹ The deprecated feature was likely used in the Yano package.
  Please report the issue to the authors.
Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.
ℹ The deprecated feature was likely used in the Yano package.
  Please report the issue to the authors.

# Let's random pick a pair of features at the same locus
obj[['var']][[]] %>% filter(locus.padj < 1e-10) %>% knitr::kable()
chr start ref alt strand locus gene_name type moransi.pval moransi autocorr.variable locus.D locus.t locus.pval locus.padj
chr10:12250345A>G/+ chr10 12250345 A G + chr10:12250345/+ CDC123 utr5 0 0.0476105 TRUE 0.3757408 -9.481203 0 0
chr10:17237326C>T/+ chr10 17237326 C T + chr10:17237326/+ VIM utr5 0 0.6889002 TRUE 1.4660226 -25.466903 0 0
chr10:17237326C=/+ chr10 17237326 C C + chr10:17237326/+ VIM utr5 0 0.5874119 TRUE 0.9576055 -27.505608 0 0
chr10:62069780C=/+ chr10 62069780 C C + chr10:62069780/+ ARID5B exon 0 0.1016406 TRUE 0.4025481 -10.877033 0 0
chr10:62069780C>T/+ chr10 62069780 C T + chr10:62069780/+ ARID5B exon 0 0.1112858 TRUE 0.4432294 -9.775477 0 0
chr10:71816550C=/- chr10 71816550 C C - chr10:71816550/- PSAP utr3 0 0.3129525 TRUE 0.5562426 -16.306266 0 0
chr2:74969604A>G/+ chr2 74969604 A G + chr2:74969604/+ POLE4 utr5 0 0.1183157 TRUE 0.5381954 -16.739082 0 0
chr20:63521953A>C/+ chr20 63521953 A C + chr20:63521953/+ PPDPF utr5 0 0.1016253 TRUE 0.3995523 -10.251637 0 0
chrM:8860A>G/+ chrM 8860 A G + chrM:8860/+ MT-ATP6 exon 0 0.1675397 TRUE 0.5565051 -12.042075 0 0
FeaturePlot(obj, features = c("chr10:17237326C>T/+", "chr10:17237326C=/+", "VIM" ), order=TRUE, pt.size = 1, ncol=3)
Warning: The `slot` argument of `FetchData()` is deprecated as of SeuratObject 5.0.0.
ℹ Please use the `layer` argument instead.
ℹ The deprecated feature was likely used in the Seurat package.
  Please report the issue at <https://github.com/satijalab/seurat/issues>.
Warning: Could not find VIM in the default search locations, found in 'RNA'
assay instead

# We could also zoom in specific region
FbtPlot(obj, val = "locus.padj", chr="chr10", start = 17225000, end = 17240000, gtf = gtf)
Zoom in chr10 : 17224000-17241000
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
ℹ The deprecated feature was likely used in the Yano package.
  Please report the issue to the authors.

It appears that the T allele of the gene VIM is expressed in most cell groups, while the C allele seems to be primarily expressed in groups 1 and 4.

Using our spatial dissimilarity analysis method, we have identified many cases of allele-specific gene expression. This phenomenon can be influenced by several known and unknown mechanisms, including gene imprinting, genetic recombination, and somatic variants during cell or disease development. While we won’t explore these mechanisms in detail here, you can refer to our manuscript, which includes several case studies dedicated to this topic.

# Save the Seurat object for further use
saveRDS(obj, file = "glioblastoma5k.rds")
Command(obj)
 [1] "NormalizeData.RNA"        "FindVariableFeatures.RNA"
 [3] "ScaleData.RNA"            "RunPCA.RNA"              
 [5] "FindNeighbors.RNA.pca"    "FindClusters"            
 [7] "RunUMAP.RNA.pca"          "NormalizeData.var"       
 [9] "ParseVarName.var"         "annoVAR.var"             
[11] "RunAutoCorr.var.pca"      "SetAutoCorrFeatures.var" 
[13] "RunSDT.var"              
sessionInfo()
R version 4.5.2 (2025-10-31)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] future_1.67.0      dplyr_1.1.4        Seurat_5.3.0       SeuratObject_5.2.0
[5] sp_2.2-0           ggplot2_4.0.0      Yano_1.2          

loaded via a namespace (and not attached):
  [1] deldir_2.0-4           pbapply_1.7-4          gridExtra_2.3         
  [4] rlang_1.1.6            magrittr_2.0.4         RcppAnnoy_0.0.22      
  [7] spatstat.geom_3.6-0    matrixStats_1.5.0      ggridges_0.5.7        
 [10] compiler_4.5.2         png_0.1-8              vctrs_0.6.5           
 [13] reshape2_1.4.4         stringr_1.5.2          pkgconfig_2.0.3       
 [16] fastmap_1.2.0          labeling_0.4.3         promises_1.3.3        
 [19] rmarkdown_2.30         purrr_1.1.0            xfun_0.53             
 [22] jsonlite_2.0.0         goftest_1.2-3          later_1.4.4           
 [25] spatstat.utils_3.2-0   irlba_2.3.5.1          parallel_4.5.2        
 [28] cluster_2.1.8.1        R6_2.6.1               ica_1.0-3             
 [31] stringi_1.8.7          RColorBrewer_1.1-3     spatstat.data_3.1-9   
 [34] reticulate_1.43.0      spatstat.univar_3.1-4  parallelly_1.45.1     
 [37] lmtest_0.9-40          scattermore_1.2        Rcpp_1.1.0            
 [40] knitr_1.50             tensor_1.5.1           future.apply_1.20.0   
 [43] zoo_1.8-14             R.utils_2.13.0         sctransform_0.4.2     
 [46] httpuv_1.6.16          Matrix_1.7-4           splines_4.5.2         
 [49] igraph_2.2.0           tidyselect_1.2.1       viridis_0.6.5         
 [52] abind_1.4-8            yaml_2.3.10            spatstat.random_3.4-2 
 [55] codetools_0.2-19       miniUI_0.1.2           spatstat.explore_3.5-3
 [58] listenv_0.9.1          lattice_0.22-5         tibble_3.3.0          
 [61] plyr_1.8.9             withr_3.0.2            shiny_1.11.1          
 [64] S7_0.2.0               ROCR_1.0-11            evaluate_1.0.5        
 [67] Rtsne_0.17             fastDummies_1.7.5      survival_3.8-3        
 [70] polyclip_1.10-7        fitdistrplus_1.2-4     pillar_1.11.1         
 [73] KernSmooth_2.23-26     plotly_4.11.0          generics_0.1.4        
 [76] RcppHNSW_0.6.0         scales_1.4.0           gtools_3.9.5          
 [79] globals_0.18.0         xtable_1.8-4           glue_1.8.0            
 [82] lazyeval_0.2.2         tools_4.5.2            data.table_1.17.8     
 [85] RSpectra_0.16-2        RANN_2.6.2             dotCall64_1.2         
 [88] cowplot_1.2.0          grid_4.5.2             tidyr_1.3.1           
 [91] nlme_3.1-168           patchwork_1.3.2        cli_3.6.5             
 [94] spatstat.sparse_3.1-0  spam_2.11-1            viridisLite_0.4.2     
 [97] uwot_0.2.3             gtable_0.3.6           R.methodsS3_1.8.2     
[100] digest_0.6.37          progressr_0.17.0       ggrepel_0.9.6         
[103] htmlwidgets_1.6.4      farver_2.1.2           htmltools_0.5.8.1     
[106] R.oo_1.27.1            lifecycle_1.0.4        httr_1.4.7            
[109] mime_0.13              MASS_7.3-65           
Back to top
Alternative splicing analysis
Annotating genetic variants