simba.tl.gen_graph

simba.tl.gen_graph(list_CP=None, list_PM=None, list_PK=None, list_CG=None, list_CC=None, prefix_C='C', prefix_P='P', prefix_M='M', prefix_K='K', prefix_G='G', copy=False, dirname='graph0', use_highly_variable=True, use_top_pcs=True, use_top_pcs_CP=None, use_top_pcs_PM=None, use_top_pcs_PK=None)[source]

Generate graph for PBG training based on indices of obs and var It also generates an accompanying file ‘entity_alias.tsv’ to map the indices to the aliases used in the graph

Parameters:
  • list_CP (list, optional (default: None)) – A list of anndata objects that store ATAC-seq data (Cells by Peaks)

  • list_PM (list, optional (default: None)) – A list of anndata objects that store relation between Peaks and Motifs

  • list_PK (list, optional (default: None)) – A list of anndata objects that store relation between Peaks and Kmers

  • list_CG (list, optional (default: None)) – A list of anndata objects that store RNA-seq data (Cells by Genes)

  • list_CC (list, optional (default: None)) – A list of anndata objects that store relation between Cells from two conditions

  • prefix_C (str, optional (default: ‘C’)) – Prefix to indicate the entity type of cells

  • prefix_G (str, optional (default: ‘G’)) – Prefix to indicate the entity type of genes

  • dirname (str, (default: ‘graph0’)) – The name of the directory in which each graph will be stored

  • use_highly_variable (bool, optional (default: True)) – Use highly variable genes

  • use_top_pcs (bool, optional (default: True)) – Use top-PCs-associated features for CP, PM, PK

  • use_top_pcs_CP (bool, optional (default: None)) – Use top-PCs-associated features for CP Once specified, it will overwrite use_top_pcs

  • use_top_pcs_PM (bool, optional (default: None)) – Use top-PCs-associated features for PM Once specified, it will overwrite use_top_pcs

  • use_top_pcs_PK (bool, optional (default: None)) – Use top-PCs-associated features for PK Once specified, it will overwrite use_top_pcs

  • copy (bool, optional (default: False)) – If True, it returns the graph file as a data frame

Returns:

  • If copy is True,

  • edges (pd.DataFrame) – The edges of the graph used for PBG training. Each line contains information about one edge. Using tabs as separators, each line contains the identifiers of the source entities, the relation types and the target entities.

  • updates .settings.pbg_params with the following parameters.

  • entity_path (str) – The path of the directory containing entity count files.

  • edge_paths (list) – A list of paths to directories containing (partitioned) edgelists. Typically a single path is provided.

  • entities (dict) – The entity types.

  • relations (list) – The relation types.

  • updates .settings.graph_stats with the following parameters.

  • `dirname` (dict) – Statistics of input graph