simba.tl.gen_graph
- simba.tl.gen_graph(list_CP=None, list_PM=None, list_PK=None, list_CG=None, list_CC=None, prefix_C='C', prefix_P='P', prefix_M='M', prefix_K='K', prefix_G='G', copy=False, dirname='graph0', use_highly_variable=True, use_top_pcs=True, use_top_pcs_CP=None, use_top_pcs_PM=None, use_top_pcs_PK=None)[source]
Generate graph for PBG training based on indices of obs and var It also generates an accompanying file ‘entity_alias.tsv’ to map the indices to the aliases used in the graph
- Parameters:
list_CP (list, optional (default: None)) – A list of anndata objects that store ATAC-seq data (Cells by Peaks)
list_PM (list, optional (default: None)) – A list of anndata objects that store relation between Peaks and Motifs
list_PK (list, optional (default: None)) – A list of anndata objects that store relation between Peaks and Kmers
list_CG (list, optional (default: None)) – A list of anndata objects that store RNA-seq data (Cells by Genes)
list_CC (list, optional (default: None)) – A list of anndata objects that store relation between Cells from two conditions
prefix_C (str, optional (default: ‘C’)) – Prefix to indicate the entity type of cells
prefix_G (str, optional (default: ‘G’)) – Prefix to indicate the entity type of genes
dirname (str, (default: ‘graph0’)) – The name of the directory in which each graph will be stored
use_highly_variable (bool, optional (default: True)) – Use highly variable genes
use_top_pcs (bool, optional (default: True)) – Use top-PCs-associated features for CP, PM, PK
use_top_pcs_CP (bool, optional (default: None)) – Use top-PCs-associated features for CP Once specified, it will overwrite use_top_pcs
use_top_pcs_PM (bool, optional (default: None)) – Use top-PCs-associated features for PM Once specified, it will overwrite use_top_pcs
use_top_pcs_PK (bool, optional (default: None)) – Use top-PCs-associated features for PK Once specified, it will overwrite use_top_pcs
copy (bool, optional (default: False)) – If True, it returns the graph file as a data frame
- Returns:
If copy is True,
edges (pd.DataFrame) – The edges of the graph used for PBG training. Each line contains information about one edge. Using tabs as separators, each line contains the identifiers of the source entities, the relation types and the target entities.
updates .settings.pbg_params with the following parameters.
entity_path (str) – The path of the directory containing entity count files.
edge_paths (list) – A list of paths to directories containing (partitioned) edgelists. Typically a single path is provided.
entities (dict) – The entity types.
relations (list) – The relation types.
updates .settings.graph_stats with the following parameters.
`dirname` (dict) – Statistics of input graph