simba.pp.filter_features

simba.pp.filter_features(adata, min_n_samples=5, max_n_samples=None, min_pct_samples=None, max_pct_samples=None, min_n_counts=None, max_n_counts=None, expr_cutoff=1)[source]

Filter out features based on different metrics.

Parameters:

adata (AnnData) – Annotated data matrix.
min_n_cells (int, optional (default: 5)) – Minimum number of cells expressing one feature
min_pct_cells (float, optional (default: None)) – Minimum percentage of cells expressing one feature
min_n_counts (int, optional (default: None)) – Minimum number of read count for one feature
expr_cutoff (float, optional (default: 1)) – Expression cutoff. If greater than expr_cutoff,the feature is considered ‘expressed’
assay (str, optional (default: ‘rna’)) – Choose from {{‘rna’,’atac’}},case insensitive

Returns:

updates adata with a subset of features that pass the filtering.
updates adata with the following fields if cal_qc() was not performed.
n_counts (pandas.Series (adata.var[‘n_counts’],dtype int)) – The number of read count each gene has.
n_cells (pandas.Series (adata.var[‘n_cells’],dtype int)) – The number of cells in which each gene is expressed.
pct_cells (pandas.Series (adata.var[‘pct_cells’],dtype float)) – The percentage of cells in which each gene is expressed.