API
Decibel functions
- module.decibel.enge_transcriptional_noise(adata, batch)
Compute the transcriptional noise as the biological variation over the technical variation. It can only be computed in datasets with ERCC spike-ins. The biological variation is computed as the correlation distance between each cell and the average gene expression of all the cells of the same cell type and the same batch. The technical variation is computed as the correlation distance between each cell and the mean ERCC spike-in expression of all the cell of the same cell type and batch. The transcriptional noise is computed as the biological variation over the technical variation.
- Parameters
adata (
annData
) – annData object with gene expression data. It must contain a slot with the batch identity of each cell in adata.obsbatch (
str
) – batch label (Examples: ‘donor’, ‘patient’, ‘mouse’)
- Returns
adata – annData object with gene expression data. Euclidean distances are stored in adata.obs[‘cordist_bio’], adata.obs[‘cordist_tech’] and adata.obs[‘noise’]
- Return type
annData
- module.decibel.distance_to_celltype_mean(adata, batch)
Compute the distance between each cell and the mean expression of its cell type in the same batch (donor/mouse). It computes three distances: euclidean, correlation and manhattan.
- Parameters
adata (
annData
) – annData object with gene expression data. It must contain a slot with the batch identity of each cell in adata.obsbatch (
str
) – batch label (Examples: ‘donor’, ‘patient’, ‘mouse’)
- Returns
adata – annData object with gene expression data. Euclidean distances are stored in adata.obs[‘cor_dist’], adata.obs[‘euc_dist’] and adata.obs[‘man_dist’]
- Return type
annData
- module.decibel.enge_euclidean_dist(adata)
Compute the Euclidean distance to the average expression across cell types using a set of invariant genes. The invariant genes are selected as follows: 1) Create equally sized bins of genes according to their mean expression 2) Discard the two most extreme bins (lowest and highest mean expression) 3) Select the 10% with the lowest coefficient of variation within each of the remaining bins
- Parameters
adata (
annData
) – annData object with gene expression data.- Returns
adata – annData object with gene expression data. Euclidean distances are stored in adata.obs[‘euc_dist_tissue_invar’]
- Return type
annData
- module.decibel.pairwise_euclidean_sample(adata, cell_type, n)
- module.decibel.hernando_herraez(adata, batch)
Computes the correlation distance of each cell to the cell type median using the 500 most variably expressed genes.
- Parameters
adata (
annData
) – annData object with gene expression data. Cell type annotations must be stored in adata.obs[‘cell_type’]cell_type (
str
) – cell type labelbatch (
str
) – batch label (Examples: ‘donor’, ‘patient’, ‘mouse’)
- Returns
adata – annData object with gene expression data. Euclidean distances are stored in adata.obs[‘cor_dist_median’]
- Return type
annData
- module.decibel.distance_to_celltype_mean_invariant(adata, batch)
Compute the distance between each cell and the mean expression of its cell type in the same batch (donor/mouse), using a set of invariant genes as in Enge (2017). It computes three distances: euclidean, correlation and manhattan.
- Parameters
adata (
annData
) – annData object with gene expression data. It must contain a slot with the batch identity of each cell in adata.obsbatch (
str
) – batch label (Examples: ‘donor’, ‘patient’, ‘mouse’)
- Returns
adata – annData object with gene expression data. Euclidean distances are stored in adata.obs[‘cor_dist_invar’], adata.obs[‘euc_dist_invar’] and adata.obs[‘man_dist_invar’]
- Return type
annData
- module.decibel.gcl(adata, num_divisions)
Following the original GCL.m script provided by the authors (https://github.com/guy531/gcl).
- module.decibel.gcl_per_cell_type_and_batch(adata, num_divisions, batch)
Compute GCL for each cell type and batch in adata.obs[‘batch’].
- Parameters
adata (
annData
) – annData object with gene expression data. It must contain a slot with the batch identity of each cell in adata.obsnum_divisions (int) – number of iterations to use in gcl()
- Returns
output – Pandas dataframe with the GCL per cell type x batch x iteration
- Return type
pd.DataFrame
- module.decibel.rerun_preprocessing(adata, batch_key)
Re-runs preprocessing steps: filter lowly expressed genes, compute HVGs, run batch-effect corrected PCA (harmony), neighbors.
- Parameters
adata (
annData
) – annData object with gene expression data. It must contain a slot with the batch identity of each cell in adata.obsbatch_key (
str
) – batch label to run batch-effect correction (Examples: ‘donor’, ‘patient’, ‘mouse’)
- Returns
adata – updated annData object with gene expression data.
- Return type
annData
- module.decibel.scallop_pipeline(adata, res_vals=None)
Compute transcriptional noise as 1 - membership score (averaged over aa range of resolution values). It runs the whole Scallop pipeline: 1) Create separate annData object per condition (young/old, smoker/non-smoker) and cell type. 2) Re-run preprocessing on annData 3) Run Scallop over range of resolution values 4) Compute average membership score across resolutions 5) Transcriptional noise = 1 - mean membership score
- Parameters
adata (
annData
) – annData object with gene expression data. It must contain a slot with the batch identity of each cell in adata.obsbatch (
str
) – batch label (Examples: ‘donor’, ‘patient’, ‘mouse’)
- Returns
adata – annData object with gene expression data. Euclidean distances are stored in adata.obs[‘cor_dist_invar’], adata.obs[‘euc_dist_invar’] and adata.obs[‘man_dist_invar’]
- Return type
annData