Detect alternative TSS/CTSS
In order to detect the alternative TSS or CTSS, we directly exploit BRIE2 which is based on a Bayesian hierarchical model.
We should make preprocessing for our TSS data from CamoTSS.
prepare h5ad file
[3]:
import warnings
warnings.filterwarnings('ignore')
[4]:
import pandas as pd
import scanpy as sc
import numpy as np
Here, we can first read the count matrix which has genes with multiple TSSs from the CamoTSS_out file.
[6]:
TSSadata=sc.read('/storage/yhhuang/users/ruiyan/NPC/srafiledir/scTSS_out/all_unannotation/count/scTSS_count_two.h5ad')
TSSadata
[6]:
AnnData object with n_obs × n_vars = 51001 × 4226
var: 'TSS_start', 'TSS_end', 'gene_id', 'Chromosome', 'Feature', 'Start', 'End', 'Strand', 'gene_name', 'len'
expadata is the expression data which can be obtained from either cellranger count or adding annotation by yourself.
[7]:
expadata=sc.read('/storage/yhhuang/users/ruiyan/NPC/srafiledir/cellranger_change_index/all_cellranger_NPC.h5ad')
expadata
[7]:
AnnData object with n_obs × n_vars = 3863735 × 33538
obs: 'index', 'cell_id', 'batch'
var: 'gene_ids', 'feature_types'
In order to build relationship with TSS data, we should first keep the var.index the same with the prefix of TSS id
[8]:
expadata.var['gene_name']=expadata.var.index
expadata.var.index=expadata.var['gene_ids']
expadata.var
[8]:
| gene_ids | feature_types | gene_name | |
|---|---|---|---|
| gene_ids | |||
| ENSG00000243485 | ENSG00000243485 | Gene Expression | MIR1302-2HG |
| ENSG00000237613 | ENSG00000237613 | Gene Expression | FAM138A |
| ENSG00000186092 | ENSG00000186092 | Gene Expression | OR4F5 |
| ENSG00000238009 | ENSG00000238009 | Gene Expression | AL627309.1 |
| ENSG00000239945 | ENSG00000239945 | Gene Expression | AL627309.3 |
| ... | ... | ... | ... |
| ENSG00000277856 | ENSG00000277856 | Gene Expression | AC233755.2 |
| ENSG00000275063 | ENSG00000275063 | Gene Expression | AC233755.1 |
| ENSG00000271254 | ENSG00000271254 | Gene Expression | AC240274.1 |
| ENSG00000277475 | ENSG00000277475 | Gene Expression | AC213203.1 |
| ENSG00000268674 | ENSG00000268674 | Gene Expression | FAM231C |
33538 rows × 3 columns
To make sure the cell barcode unique
[9]:
expadata.obs_names_make_unique()
expadata
[9]:
AnnData object with n_obs × n_vars = 3863735 × 33538
obs: 'index', 'cell_id', 'batch'
var: 'gene_ids', 'feature_types', 'gene_name'
Select the TSS with max count
[10]:
genedf=adata.var.copy()
genedf
[10]:
| TSS_start | TSS_end | gene_id | Chromosome | Feature | Start | End | Strand | gene_name | len | |
|---|---|---|---|---|---|---|---|---|---|---|
| transcript_id | ||||||||||
| ENSG00000197530_newTSS | 1629591 | 1629697 | ENSG00000197530 | 1 | gene | 1615414 | 1630610 | + | MIB2 | 15196 |
| ENSG00000197530_ENST00000518681 | 1615420 | 1615554 | ENSG00000197530 | 1 | gene | 1615414 | 1630610 | + | MIB2 | 15196 |
| ENSG00000197530_ENST00000510793 | 1616194 | 1616348 | ENSG00000197530 | 1 | gene | 1615414 | 1630610 | + | MIB2 | 15196 |
| ENSG00000189409_ENST00000435358 | 1633111 | 1633230 | ENSG00000189409 | 1 | gene | 1632162 | 1635263 | + | MMP23B | 3101 |
| ENSG00000189409_ENST00000472264 | 1632164 | 1632228 | ENSG00000189409 | 1 | gene | 1632162 | 1635263 | + | MMP23B | 3101 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| ENSG00000089820_ENST00000461052 | 153926592 | 153926836 | ENSG00000089820 | X | gene | 153907366 | 153934999 | - | ARHGAP4 | 27633 |
| ENSG00000013563_newTSS | 154411909 | 154412002 | ENSG00000013563 | X | gene | 154401235 | 154412112 | - | DNASE1L1 | 10877 |
| ENSG00000013563_ENST00000369807 | 154409083 | 154409253 | ENSG00000013563 | X | gene | 154401235 | 154412112 | - | DNASE1L1 | 10877 |
| ENSG00000183878_ENST00000479713 | 13479746 | 13479934 | ENSG00000183878 | Y | gene | 13248378 | 13480673 | - | UTY | 232295 |
| ENSG00000183878_newTSS | 13479231 | 13479358 | ENSG00000183878 | Y | gene | 13248378 | 13480673 | - | UTY | 232295 |
4226 rows × 10 columns
[11]:
genedf['count']=np.sum(adata.X,axis=0)
genedf
[11]:
| TSS_start | TSS_end | gene_id | Chromosome | Feature | Start | End | Strand | gene_name | len | count | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| transcript_id | |||||||||||
| ENSG00000197530_newTSS | 1629591 | 1629697 | ENSG00000197530 | 1 | gene | 1615414 | 1630610 | + | MIB2 | 15196 | 149.0 |
| ENSG00000197530_ENST00000518681 | 1615420 | 1615554 | ENSG00000197530 | 1 | gene | 1615414 | 1630610 | + | MIB2 | 15196 | 521.0 |
| ENSG00000197530_ENST00000510793 | 1616194 | 1616348 | ENSG00000197530 | 1 | gene | 1615414 | 1630610 | + | MIB2 | 15196 | 59.0 |
| ENSG00000189409_ENST00000435358 | 1633111 | 1633230 | ENSG00000189409 | 1 | gene | 1632162 | 1635263 | + | MMP23B | 3101 | 117.0 |
| ENSG00000189409_ENST00000472264 | 1632164 | 1632228 | ENSG00000189409 | 1 | gene | 1632162 | 1635263 | + | MMP23B | 3101 | 96.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| ENSG00000089820_ENST00000461052 | 153926592 | 153926836 | ENSG00000089820 | X | gene | 153907366 | 153934999 | - | ARHGAP4 | 27633 | 127.0 |
| ENSG00000013563_newTSS | 154411909 | 154412002 | ENSG00000013563 | X | gene | 154401235 | 154412112 | - | DNASE1L1 | 10877 | 901.0 |
| ENSG00000013563_ENST00000369807 | 154409083 | 154409253 | ENSG00000013563 | X | gene | 154401235 | 154412112 | - | DNASE1L1 | 10877 | 838.0 |
| ENSG00000183878_ENST00000479713 | 13479746 | 13479934 | ENSG00000183878 | Y | gene | 13248378 | 13480673 | - | UTY | 232295 | 3606.0 |
| ENSG00000183878_newTSS | 13479231 | 13479358 | ENSG00000183878 | Y | gene | 13248378 | 13480673 | - | UTY | 232295 | 139.0 |
4226 rows × 11 columns
[12]:
keepdf=genedf.sort_values(['gene_id','count']).groupby('gene_id').tail(2)
keepdf
[12]:
| TSS_start | TSS_end | gene_id | Chromosome | Feature | Start | End | Strand | gene_name | len | count | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| transcript_id | |||||||||||
| ENSG00000000460_newTSS | 169683373 | 169683419 | ENSG00000000460 | 1 | gene | 169662006 | 169854080 | + | C1orf112 | 192074 | 418.0 |
| ENSG00000000460_ENST00000359326 | 169794958 | 169795195 | ENSG00000000460 | 1 | gene | 169662006 | 169854080 | + | C1orf112 | 192074 | 751.0 |
| ENSG00000000938_newTSS | 27626457 | 27626515 | ENSG00000000938 | 1 | gene | 27612063 | 27635185 | - | FGR | 23122 | 325.0 |
| ENSG00000000938_ENST00000457296 | 27626045 | 27626174 | ENSG00000000938 | 1 | gene | 27612063 | 27635185 | - | FGR | 23122 | 732.0 |
| ENSG00000001167_newTSS | 41080778 | 41080854 | ENSG00000001167 | 6 | gene | 41072944 | 41099976 | + | NFYA | 27032 | 103.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| ENSG00000284691_ENST00000641717 | 150405407 | 150405450 | ENSG00000284691 | 7 | gene | 150400701 | 150412470 | + | AC073111.4 | 11769 | 100.0 |
| ENSG00000285077_ENST00000567449 | 30624493 | 30624569 | ENSG00000285077 | 15 | gene | 30624493 | 30649529 | + | AC091057.6 | 25036 | 55.0 |
| ENSG00000285077_newTSS | 30626051 | 30626146 | ENSG00000285077 | 15 | gene | 30624493 | 30649529 | + | AC091057.6 | 25036 | 421.0 |
| ENSG00000285106_newTSS | 131109754 | 131109946 | ENSG00000285106 | 7 | gene | 130791263 | 131110161 | - | AC016831.7 | 318898 | 226.0 |
| ENSG00000285106_ENST00000642963 | 131107859 | 131108049 | ENSG00000285106 | 7 | gene | 130791263 | 131110161 | - | AC016831.7 | 318898 | 6762.0 |
3568 rows × 11 columns
[13]:
firstdf=keepdf[::2]
firstdf
[13]:
| TSS_start | TSS_end | gene_id | Chromosome | Feature | Start | End | Strand | gene_name | len | count | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| transcript_id | |||||||||||
| ENSG00000000460_newTSS | 169683373 | 169683419 | ENSG00000000460 | 1 | gene | 169662006 | 169854080 | + | C1orf112 | 192074 | 418.0 |
| ENSG00000000938_newTSS | 27626457 | 27626515 | ENSG00000000938 | 1 | gene | 27612063 | 27635185 | - | FGR | 23122 | 325.0 |
| ENSG00000001167_newTSS | 41080778 | 41080854 | ENSG00000001167 | 6 | gene | 41072944 | 41099976 | + | NFYA | 27032 | 103.0 |
| ENSG00000001460_newTSS | 24413671 | 24413692 | ENSG00000001460 | 1 | gene | 24356998 | 24416934 | - | STPG1 | 59936 | 57.0 |
| ENSG00000001631_newTSS | 92245433 | 92245509 | ENSG00000001631 | 7 | gene | 92198968 | 92246166 | - | KRIT1 | 47198 | 142.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| ENSG00000278558_newTSS | 18528598 | 18528654 | ENSG00000278558 | 22 | gene | 18527801 | 18530573 | + | TMEM191B | 2772 | 97.0 |
| ENSG00000278709_newTSS | 57710609 | 57710825 | ENSG00000278709 | 20 | gene | 57710155 | 57712780 | + | NKILA | 2625 | 71.0 |
| ENSG00000284691_newTSS | 150408026 | 150408165 | ENSG00000284691 | 7 | gene | 150400701 | 150412470 | + | AC073111.4 | 11769 | 93.0 |
| ENSG00000285077_ENST00000567449 | 30624493 | 30624569 | ENSG00000285077 | 15 | gene | 30624493 | 30649529 | + | AC091057.6 | 25036 | 55.0 |
| ENSG00000285106_newTSS | 131109754 | 131109946 | ENSG00000285106 | 7 | gene | 130791263 | 131110161 | - | AC016831.7 | 318898 | 226.0 |
1784 rows × 11 columns
[14]:
seconddf=keepdf[1::2]
seconddf
[14]:
| TSS_start | TSS_end | gene_id | Chromosome | Feature | Start | End | Strand | gene_name | len | count | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| transcript_id | |||||||||||
| ENSG00000000460_ENST00000359326 | 169794958 | 169795195 | ENSG00000000460 | 1 | gene | 169662006 | 169854080 | + | C1orf112 | 192074 | 751.0 |
| ENSG00000000938_ENST00000457296 | 27626045 | 27626174 | ENSG00000000938 | 1 | gene | 27612063 | 27635185 | - | FGR | 23122 | 732.0 |
| ENSG00000001167_ENST00000341376 | 41072944 | 41073076 | ENSG00000001167 | 6 | gene | 41072944 | 41099976 | + | NFYA | 27032 | 2039.0 |
| ENSG00000001460_ENST00000483261 | 24391926 | 24392095 | ENSG00000001460 | 1 | gene | 24356998 | 24416934 | - | STPG1 | 59936 | 58.0 |
| ENSG00000001631_ENST00000412043 | 92245674 | 92245834 | ENSG00000001631 | 7 | gene | 92198968 | 92246166 | - | KRIT1 | 47198 | 4188.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| ENSG00000278558_ENST00000618859 | 18529298 | 18529492 | ENSG00000278558 | 22 | gene | 18527801 | 18530573 | + | TMEM191B | 2772 | 358.0 |
| ENSG00000278709_ENST00000614771 | 57710157 | 57710345 | ENSG00000278709 | 20 | gene | 57710155 | 57712780 | + | NKILA | 2625 | 495.0 |
| ENSG00000284691_ENST00000641717 | 150405407 | 150405450 | ENSG00000284691 | 7 | gene | 150400701 | 150412470 | + | AC073111.4 | 11769 | 100.0 |
| ENSG00000285077_newTSS | 30626051 | 30626146 | ENSG00000285077 | 15 | gene | 30624493 | 30649529 | + | AC091057.6 | 25036 | 421.0 |
| ENSG00000285106_ENST00000642963 | 131107859 | 131108049 | ENSG00000285106 | 7 | gene | 130791263 | 131110161 | - | AC016831.7 | 318898 | 6762.0 |
1784 rows × 11 columns
Get the expression data with the same cell and same genes as TSS adata
[19]:
keepexpadata=expadata[adata.obs.index.tolist(),seconddf['gene_id'].tolist()]
keepexpadata
[19]:
View of AnnData object with n_obs × n_vars = 51001 × 1784
obs: 'index', 'cell_id', 'batch'
var: 'gene_ids', 'feature_types', 'gene_name'
[20]:
keepexpadata.X.toarray()
[20]:
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 1.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 1.]], dtype=float32)
Add isoform1 and isoform2 to the layer of expression data and add the name of isoform1 and the name of isoform2 to the gene name
[21]:
isoform1adata=TSSadata[:,firstdf.index.tolist()]
isoform1adata
[21]:
View of AnnData object with n_obs × n_vars = 51001 × 1784
var: 'TSS_start', 'TSS_end', 'gene_id', 'Chromosome', 'Feature', 'Start', 'End', 'Strand', 'gene_name', 'len'
[23]:
isoform1adata.X
[23]:
ArrayView([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]], dtype=float32)
[22]:
isoform2adata=TSSadata[:,seconddf.index.tolist()]
isoform2adata
[22]:
View of AnnData object with n_obs × n_vars = 51001 × 1784
var: 'TSS_start', 'TSS_end', 'gene_id', 'Chromosome', 'Feature', 'Start', 'End', 'Strand', 'gene_name', 'len'
[24]:
isoform2adata.X
[24]:
ArrayView([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 1.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 2.]], dtype=float32)
[25]:
keepexpadata.layers['isoform1']=isoform1adata.X
keepexpadata.layers['isoform2']=isoform2adata.X
[26]:
keepexpadata.var['isoform1_name']=isoform1adata.var.index.values
keepexpadata.var['isoform2_name']=isoform2adata.var.index.values
We did not the ambiguous count value, so we set them as zero.
[27]:
keepexpadata.layers['ambiguous']=np.zeros((51001, 1784))
keepexpadata
[27]:
AnnData object with n_obs × n_vars = 51001 × 1784
obs: 'index', 'cell_id', 'batch'
var: 'gene_ids', 'feature_types', 'gene_name', 'isoform1_name', 'isoform2_name'
layers: 'isoform1', 'isoform2', 'ambiguous'
In order to run BRIE2 smoothly, we set the type of data as float32
[28]:
keepexpadata.X=keepexpadata.X.astype('float32')
keepexpadata.layers['isoform1']=keepexpadata.layers['isoform1'].astype('float32')
keepexpadata.layers['isoform2']=keepexpadata.layers['isoform2'].astype('float32')
keepexpadata.layers['ambiguous']=keepexpadata.layers['ambiguous'].astype('float32')
Please make sure the expression profile of keepexpadata is raw data and the counts of isoform1 and isoform2 are also raw data
[29]:
keepexpadata.X.toarray()
[29]:
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 1.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 1.]], dtype=float32)
[30]:
keepexpadata.layers['isoform1'].toarray()
[30]:
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]], dtype=float32)
[31]:
keepexpadata.layers['isoform2'].toarray()
[31]:
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 1.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 2.]], dtype=float32)
[ ]:
keepexpadata.write('/storage/yhhuang/users/ruiyan/NPC/express_to_brie.h5ad')
Get h5ad file for each cell type
Actually, here, we can get the alternative TSS usage for each cell type between NLH and NPC (two conditions).
In another words, the number that you need run BRIE2 is the same with the number of cell type.
So you need to do cell annotation for these cells and then separate them to get h5ad file for each cell types.
[33]:
annodata=sc.read('/storage/yhhuang/users/ruiyan/NPC/srafiledir/scTSS_out/NPC_new_normalized.h5ad')
annodata
[33]:
AnnData object with n_obs × n_vars = 51001 × 4226
obs: 'cluster', 'UMAP_1', 'UMAP_2', 'patient_ID', 'condition'
var: 'TSS_start', 'TSS_end', 'gene_id', 'Chromosome', 'Feature', 'Start', 'End', 'Strand', 'gene_name', 'len'
uns: 'log1p'
obsm: 'X_umap'
layers: 'counts'
Map cell type information for each cell
[34]:
keepexpadata.obs['cluster']=annodata.obs.index.map(annodata.obs['cluster'])
keepexpadata
[34]:
AnnData object with n_obs × n_vars = 51001 × 1784
obs: 'index', 'cell_id', 'batch', 'cluster'
var: 'gene_ids', 'feature_types', 'gene_name', 'isoform1_name', 'isoform2_name'
layers: 'isoform1', 'isoform2', 'ambiguous'
Note Please replace the path with your own path to store all h5ad file of cell type
[36]:
def get_celltype_adata(cellType):
celltypeadata=keepexpadata[keepexpadata.obs['cluster']==cellType,:]
celltype=cellType.replace(' ','_')
file=celltype+'.h5ad'
outputpath=os.path.join('/storage/yhhuang/users/ruiyan/NPC/diffcluster/',file)
celltypeadata.write(outputpath)
[ ]:
for celltype in keepexpadata.obs['cluster'].unique():
get_celltype_adata(celltype)
Get Cdr file for each cell type
The cdr file means the cell detected rate file. That is the proportion of gene in the all genes for this cell.
For different scenarios, it includes different columns.
If you want to compare two conditions, such as NLH and NPC, then you just give one column to tell BRIE2 that it is NLH or not.
If you want to campare multiple conditions, such as skin,brain and spleen, then you should give binary value for all of these organs.
[38]:
import pandas as pd
import scanpy as sc
import numpy as np
import os
[39]:
keepexpadata=sc.read('/storage/yhhuang/users/ruiyan/NPC/express_to_brie.h5ad')
keepexpadata
[39]:
AnnData object with n_obs × n_vars = 51001 × 1784
obs: 'index', 'cell_id', 'batch'
var: 'gene_ids', 'feature_types', 'gene_name', 'isoform1_name', 'isoform2_name'
layers: 'ambiguous', 'isoform1', 'isoform2'
[40]:
def get_cdr_file(cellcluster):
cellcluster=cellcluster.replace(' ','_')
inputfile='/storage/yhhuang/users/ruiyan/NPC/diffcluster/'+str(cellcluster)+'.h5ad'
clusteradata=sc.read(inputfile)
diseasedf=pd.DataFrame(clusteradata.obs['condition'])
cdrdiseasedf = pd.get_dummies(diseasedf.condition, prefix='disease')
cdr=np.array((clusteradata.X > 0).mean(1))[:,0]
cdrdiseasedf['detect_rate']=cdr
cdrdiseasedf.reset_index(inplace=True)
cdrdiseasedf.drop(labels=['disease_NLH'],axis=1,inplace=True)
print(cdrdiseasedf)
outfile='/storage/yhhuang/users/ruiyan/NPC/diffcluster/NPC_cdr_'+cellcluster+'.tsv'
cdrdiseasedf.to_csv(outfile,sep='\t',index=None)
[ ]:
for celltype in adata.obs['cluster'].unique():
get_cdr_file(celltype)
Run BRIE2
Here, we write a for loop for each cell type to run BRIE2
For more information about BRIE2, you can check this tutorial
[ ]:
#!/bin/bash
arr=(NK Myeloids Epithelial Fibroblast T_cells B_cells)
for i in ${arr[@]}
do
echao $i
brie-quant -i /storage/yhhuang/users/ruiyan/NPC/diffcluster/${i}.h5ad -c /storage/yhhuang/users/ruiyan/NPC/diffcluster/NPC_cdr_${i}.tsv -o /storage/yhhuang/users/ruiyan/NPC/diffcluster/Brie_out/brie_NPC_${i}.h5ad --batchSize 1000000 --minCell 10 --interceptMode gene --testBase full --LRTindex 0
done
Filter genes with TSS shift between two conditions
Here, we set the threshold as FDR<0.01 to filter genes with alternative TSS
[41]:
import pandas as pd
import scanpy as sc
[42]:
def get_sign_gene(celltype):
celltype=celltype.replace(' ','_')
inputfile='/storage/yhhuang/users/ruiyan/NPC/diffcluster/Brie_out/brie_NPC_'+celltype+'.brie_ident.tsv'
briedf=pd.read_csv(inputfile,delimiter='\t')
print(briedf)
signTdf=briedf[briedf['disease_NPC_FDR']<0.01]
outputfile='/storage/yhhuang/users/ruiyan/NPC/diffcluster/sign_Gene/'+celltype+'.csv'
signTdf.to_csv(outputfile,index=None)
[ ]:
for celltype in keepexpadata.obs['cluster'].unique():
get_sign_gene(celltype)
Display gene with alternative TSS using valcano plot
For this part, you can also refer to BRIE2 example.
[43]:
import brie
import scanpy as sc
import matplotlib.pyplot as plt
[44]:
print(brie.__version__)
2.2.0
[45]:
adata=sc.read('/storage/yhhuang/users/ruiyan/NPC/diffcluster/Brie_out/brie_NPC_T_cells.h5ad')
adata
[45]:
AnnData object with n_obs × n_vars = 24571 × 1628
obs: 'cluster', 'UMAP_1', 'UMAP_2', 'patient_ID', 'condition'
var: 'gene_ids', 'feature_types', 'gene_name', 'isoform1_name', 'isoform2_name', 'n_counts', 'n_counts_uniq', 'loss_gene'
uns: 'Xc_ids', 'brie_losses', 'brie_param', 'brie_version'
obsm: 'Xc'
varm: 'ELBO_gain', 'cell_coeff', 'fdr', 'intercept', 'pval', 'sigma'
layers: 'Psi', 'Psi_95CI', 'Z_std', 'ambiguous', 'isoform1', 'isoform2'
[46]:
adata.uns['brie_param']
[46]:
{'LRT_index': array([0]),
'base_mode': 'full',
'intercept_mode': 'gene',
'layer_keys': array(['isoform1', 'isoform2', 'ambiguous'], dtype=object),
'pseudo_count': 0.01}
[47]:
#Change gene index from Ensemebl id to gene name
adata.var.index=adata.var['gene_name']
adata.var.index
[47]:
Index(['C1orf112', 'FGR', 'NFYA', 'STPG1', 'KRIT1', 'SNX11', 'CASP10', 'CFLAR',
'SARM1', 'FKBP4',
...
'UHRF1', 'CCL4L2', 'MYO19', 'GGNBP2', 'ACACA', 'TMEM191B', 'NKILA',
'AC073111.5', 'ARHGAP11B', 'AC016831.7'],
dtype='object', name='gene_name', length=1628)
[49]:
## volcano plot for differential splicing events
#fig = plt.figure(figsize=(4.5, 3.5), dpi=300)
brie.pl.volcano(adata, y='ELBO_gain', log_y=False, n_anno=16, score_red=7, adjust=True)
plt.xlabel('cell_coeff: effect size on logit(Psi)')
plt.title("NPC differential TSS in T cell")
#fig.savefig('/storage/yhhuang/users/ruiyan/figure/figure5/T_volcano.pdf',bbox_inches='tight',dpi=300)
plt.show()
[ ]: