simba.tl.gene_scores

simba.tl.gene_scores(adata, genome, gene_anno=None, tss_upstream=100000.0, tss_downsteam=100000.0, gb_upstream=5000, cutoff_weight=1, use_top_pcs=True, use_precomputed=True, use_gene_weigt=True, min_w=1, max_w=5)[source]

Calculate gene scores

Parameters:
  • adata (AnnData) – Annotated data matrix.

  • genome (str) – Reference genome. Choose from {‘hg19’, ‘hg38’, ‘mm9’, ‘mm10’}

  • gene_anno (pandas.DataFrame, optional (default: None)) – Dataframe of gene annotation. If None, built-in gene annotation will be used depending on genome; If provided, custom gene annotation will be used instead.

  • tss_upstream (int, optional (default: 1e5)) – The number of base pairs upstream of TSS

  • tss_downsteam (int, optional (default: 1e5)) – The number of base pairs downstream of TSS

  • gb_upstream (int, optional (default: 5000)) – The number of base pairs upstream by which gene body is extended. Peaks within the extended gene body are given the weight of 1.

  • cutoff_weight (float, optional (default: 1)) – Weight cutoff for peaks

  • use_top_pcs (bool, optional (default: True)) – If True, only peaks associated with top PCs will be used

  • use_precomputed (bool, optional (default: True)) – If True, overlap bewteen peaks and genes (stored in adata.uns[‘gene_scores’][‘overlap’]) will be imported

  • use_gene_weigt (bool, optional (default: True)) – If True, for each gene, the number of peaks assigned to it will be rescaled based on gene size

  • min_w (int, optional (default: 1)) – The minimum weight for each gene. Only valid if use_gene_weigt is True

  • max_w (int, optional (default: 5)) – The maximum weight for each gene. Only valid if use_gene_weigt is True

Returns:

  • adata_new (AnnData) – Annotated data matrix. Stores #cells x #genes gene score matrix

  • updates adata with the following fields.

  • overlap (pandas.DataFrame, (adata.uns[‘gene_scores’][‘overlap’])) – Dataframe of overlap between peaks and genes