simba.tl.gene_scores

simba.tl.gene_scores(adata, genome, gene_anno=None, tss_upstream=100000.0, tss_downsteam=100000.0, gb_upstream=5000, cutoff_weight=1, use_top_pcs=True, use_precomputed=True, use_gene_weigt=True, min_w=1, max_w=5)[source]

Calculate gene scores

Parameters:

adata (AnnData) – Annotated data matrix.
genome (str) – Reference genome. Choose from {‘hg19’, ‘hg38’, ‘mm9’, ‘mm10’}
gene_anno (pandas.DataFrame, optional (default: None)) – Dataframe of gene annotation. If None, built-in gene annotation will be used depending on genome; If provided, custom gene annotation will be used instead.
tss_upstream (int, optional (default: 1e5)) – The number of base pairs upstream of TSS
tss_downsteam (int, optional (default: 1e5)) – The number of base pairs downstream of TSS
gb_upstream (int, optional (default: 5000)) – The number of base pairs upstream by which gene body is extended. Peaks within the extended gene body are given the weight of 1.
cutoff_weight (float, optional (default: 1)) – Weight cutoff for peaks
use_top_pcs (bool, optional (default: True)) – If True, only peaks associated with top PCs will be used
use_precomputed (bool, optional (default: True)) – If True, overlap bewteen peaks and genes (stored in adata.uns[‘gene_scores’][‘overlap’]) will be imported
use_gene_weigt (bool, optional (default: True)) – If True, for each gene, the number of peaks assigned to it will be rescaled based on gene size
min_w (int, optional (default: 1)) – The minimum weight for each gene. Only valid if use_gene_weigt is True
max_w (int, optional (default: 5)) – The maximum weight for each gene. Only valid if use_gene_weigt is True

Returns:

adata_new (AnnData) – Annotated data matrix. Stores #cells x #genes gene score matrix
updates adata with the following fields.
overlap (pandas.DataFrame, (adata.uns[‘gene_scores’][‘overlap’])) – Dataframe of overlap between peaks and genes