spac package
Submodules
spac.data_utils module
- spac.data_utils.add_pin_color_rules(adata, label_color_dict: dict, color_map_name: str = '_spac_colors', overwrite: bool = True) Tuple[dict, str] [source]
Adds pin color rules to the AnnData object and scans for matching labels.
This function scans unique labels in each adata.obs and column names in all adata tables, to find the labels defined by the pin color rule.
- Parameters:
adata – The anndata object containing upstream analysis.
label_color_dict (dict) – Dictionary of pin color rules with label as key and color as value.
color_map_name (str) – The name to use for storing pin color rules in adata.uns.
overwrite (bool, optional) – Whether to overwrite existing pin color rules in adata.uns with the same name, by default True.
- Returns:
label_matches (dict) – Dictionary with the matching labels in each section (obs, var, X, etc.).
result_str (str) – Summary string with the matching labels in each section (obs, var, X, etc.).
- Raises:
ValueError – If color_map_name already exists in adata.uns and overwrite is False.
- spac.data_utils.add_rescaled_features(adata, min_quantile, max_quantile, layer)[source]
Clip and rescale the features matrix.
The results will be added into a new layer in the AnnData object.
- Parameters:
adata (anndata.AnnData) – The AnnData object.
min_quantile (float) – The minimum quantile to rescale to zero.
max_quantile (float) – The maximum quantile to rescale to one.
layer (str) – The name of the new layer to add to the anndata object.
- spac.data_utils.append_annotation(data: DataFrame, annotation: dict) DataFrame [source]
Append a new annotation with single value to a Pandas DataFrame based on mapping rules.
- Parameters:
data (pd.DataFrame) – The input DataFrame to which the new observation will be appended.
annotation (dict) – dictionary of string pairs representing the new annotation and its value. Each pair should have this format: <new annotation column name>:<value of the annotation> The values must be a single string or numeric value.
- Returns:
The DataFrame with the new observation appended.
- Return type:
pd.DataFrame
- spac.data_utils.bin2cat(data, one_hot_annotations, new_annotation)[source]
Combine a set of columns representing a binary one hot encoding of categories into a new categorical column.
- Parameters:
data (pandas.DataFrame) – The pandas dataframe containing the one hot encoded annotations.
one_hot_annotations (str or list of str) – A string or a list of strings representing python regular expression of the one hot encoded annotations columns in the data frame.
new_annotation (str) – The column name for new categorical annotation to be created.
- Returns:
pandas.DataFrame – DataFrame with new categorical column added.
Example
——–
>>> data = pd.DataFrame({
… ‘A’ ([1, 1, 0, 0],)
… ‘B’ ([0, 0, 1, 0])
… })
>>> one_hot_annotations = [‘A’, ‘B’]
>>> new_annotation = ‘new_category’
>>> result = bin2cat(data, one_hot_annotations, new_annotation)
>>> print(result[new_annotation])
0 A
1 A
2 B
3 NaN
Name (new_category, dtype: object)
- spac.data_utils.calculate_centroid(data, x_min, x_max, y_min, y_max, new_x, new_y)[source]
Calculate the spatial coordinates of the cell centroid as the average of min and max coordinates.
- Parameters:
data (pd.DataFrame) – The input data frame. The dataframe should contain four columns for x_min, x_max, y_min, and y_max for centroid calculation.
x_min (str) – column name with minimum x value
x_max (str) – column name with maximum x value
y_min (str) – column name with minimum y value
y_max (str) – column name with maximum y value
new_x (str) – the new column name of the x dimension of the cientroid, allowing characters are alphabetic, digits and underscore
new_y (str) – the new column name of the y dimension of the centroid, allowing characters are alphabetic, digits and underscore
- Returns:
data – dataframe with two new centroid columns addded. Note that the dataframe is modified in place.
- Return type:
pd.DataFrame
- spac.data_utils.combine_annotations(adata: AnnData, annotations: list, separator: str, new_annotation_name: str) AnnData [source]
Combine multiple annotations into a new annotation using a defined separator.
- Parameters:
adata (AnnData) – The input AnnData object whose .obs will be modified.
annotations (list) – List of annotation column names to combine.
separator (str) – Separator to use when combining annotations.
new_annotation_name (str) – The name of the new annotation to be created.
- Returns:
The AnnData object with the combined annotation added.
- Return type:
AnnData
- spac.data_utils.combine_dfs(dataframes: list)[source]
Combined multiple pandas dataframes into one. Schema of the first dataframe is considered primary. A warming will be printed if schema of current dataframe is different than the primary.
- Parameters:
dataframes (list[pd.DataFrame]) – A list of pandas dataframe to be combined
- Return type:
A pd.DataFrame of combined dataframs.
- spac.data_utils.concatinate_regions(regions)[source]
Concatinate data from multiple regions and create new indexes.
- Parameters:
regions (list of anndata.AnnData) – AnnData objects to be concatinated.
- Returns:
New AnddData object with the concatinated values in AnnData.X
- Return type:
anndata.AnnData
- spac.data_utils.downsample_cells(input_data, annotations, n_samples=None, stratify=False, rand=False, combined_col_name='_combined_', min_threshold=5)[source]
Custom downsampling of data based on one or more annotations.
This function offers two primary modes of operation: 1. Grouping (stratify=False):
For a single annotation: The data is grouped by unique values of the annotation, and ‘n_samples’ rows are selected from each group.
For multiple annotations: The data is grouped based on unique combinations of the annotations, and ‘n_samples’ rows are selected from each combined group.
Stratification (stratify=True): - Annotations (single or multiple) are combined into a new column. - Proportionate stratified sampling is performed based on the unique
combinations in the new column, ensuring that the downsampled dataset maintains the proportionate representation of each combined group from the original dataset.
- Parameters:
input_data (pd.DataFrame) – The input data frame.
annotations (str or list of str) – The column name(s) to downsample on. If multiple column names are provided, their values are combined using an underscore as a separator.
n_samples (int, default=None) –
The number of samples to return. Behavior differs based on the ‘stratify’ parameter: - stratify=False: Returns ‘n_samples’ for each unique value (or
combination) of annotations.
stratify=True: Returns a total of ‘n_samples’ stratified by the frequency of every label or combined labels in the annotation(s).
stratify (bool, default=False) – If true, perform proportionate stratified sampling based on the unique combinations of annotations. This ensures that the downsampled dataset maintains the proportionate representation of each combined group from the original dataset.
rand (bool, default=False) – If true and stratify is True, randomly select the returned cells. Otherwise, choose the first n cells.
combined_col_name (str, default='_combined_') – Name of the column that will store combined values when multiple annotation columns are provided.
min_threshold (int, default=5) – The minimum number of samples a combined group should have in the original dataset to be considered in the downsampled dataset. Groups with fewer samples than this threshold will be excluded from the stratification process. Adjusting this parameter determines the minimum presence a combined group should have in the original dataset to appear in the downsampled version.
- Returns:
output_data – The proportionately stratified downsampled data frame.
- Return type:
pd.DataFrame
Notes
This function emphasizes proportionate stratified sampling, ensuring that the downsampled dataset is a representative subset of the original data with respect to the combined annotations. Due to this proportionate nature, not all unique combinations from the original dataset might be present in the downsampled dataset, especially if a particular combination has very few samples in the original dataset. The min_threshold parameter can be adjusted to determine the minimum number of samples a combined group should have in the original dataset to appear in the downsampled version.
- spac.data_utils.ingest_cells(dataframe, regex_str, x_col=None, y_col=None, annotation=None)[source]
Read the csv file into an anndata object.
The function will also intialize features and spatial coordiantes.
- Parameters:
dataframe (pandas.DataFrame) – The data frame that contains cells as rows, and cells informations as columns.
regex_str (str or list of str) – A string or a list of strings representing python regular expression for the features columns in the data frame. x_col : str The column name for the x coordinate of the cell.
y_col (str) – The column name for the y coordinate of the cell.
annotation (str or list of str) – The column name for the region that the cells. If a list is passed, multiple annotations will be created in the returned AnnData object.
- Returns:
The generated AnnData object
- Return type:
anndata.AnnData
- spac.data_utils.load_csv_files(file_names)[source]
Read the csv file(s) into a pandas dataframe.
- Parameters:
file_names (str or list) – A list of csv file paths to be combined into single list of dataframe output
- Returns:
A pandas dataframe of all the csv files. The returned dataset will have an extra column called “loaded_file_name” containing source file name.
- Return type:
pandas.dataframe
- spac.data_utils.rescale_features(features, min_quantile=0.01, max_quantile=0.99)[source]
Clip and rescale features outside the minimum and maximum quantile.
The rescaled features will be between 0 and 1.
- Parameters:
features (pandas.Dataframe) – The DataRrame of features.
min_quantile (float) – The minimum quantile to be consider zero.
max_quantile (float) – The maximum quantile to be considerd 1.
- Returns:
The created DataFrame with normalized features.
- Return type:
pandas.DataFrame
- spac.data_utils.select_values(data, annotation, values=None, exclude_values=None)[source]
Selects values from either a pandas DataFrame or an AnnData object based on the annotation and values.
- Parameters:
data (pandas.DataFrame or anndata.AnnData) – The input data. Can be a DataFrame for tabular data or an AnnData object.
annotation (str) – The column name in a DataFrame or the annotation key in an AnnData object to be used for selection.
values (str or list of str) – List of values for the annotation to include. If None, all values are considered for selection.
exclude_values (str or list of str) – List of values for the annotation to exclude. Can’t be combined with values.
- Returns:
The filtered DataFrame or AnnData object containing only the selected rows based on the annotation and values.
- Return type:
pandas.DataFrame or anndata.AnnData
- spac.data_utils.subtract_min_per_region(adata, annotation, layer, min_quantile=0.01)[source]
Substract the minimum quantile of every marker per region.
- Parameters:
adata (anndata.AnnData) – The AnnData object.
annotation (str) – The name of the annotation in adata to define batches.
min_quantile (float) – The minimum quantile to rescale to zero.
layer (str) – The name of the new layer to add to the AnnData object.
- spac.data_utils.subtract_min_quantile(features, min_quantile=0.01)[source]
Subtract the features defined by the minimum quantile from all columns.
- Parameters:
features (pandas.DataFrame) – The dataframe of features.
min_quantile (float) – The minimum quantile to be consider zero.
- Returns:
dataframe with rescaled features.
- Return type:
pandas.DataFrame
spac.phenotyping module
- spac.phenotyping.apply_phenotypes(data_df, phenotypes_dic)[source]
Add binary columns to the DataFrame indicating if each cell matches a phenotype.
- Parameters:
data_df (pandas.DataFrame) – The DataFrame to which binary phenotype columns will be added.
phenotypes_dic (dict) – A dictionary where the keys are phenotype names and the values are dictionaries mapping column names to values.
- Returns:
A dictionary where the keys are phenotype names and the values are the counts of rows that match each phenotype.
- Return type:
dict
Notes
The function creates binary columns in the DataFrame for each phenotype and counts the number of rows matching each phenotype.
- spac.phenotyping.assign_manual_phenotypes(data_df, phenotypes_df, annotation='manual_phenotype', prefix='', suffix='', multiple=True, drop_binary_code=True)[source]
Assign manual phenotypes to the DataFrame and generate summaries.
- Parameters:
data_df (pandas.DataFrame) – The DataFrame to which manual phenotypes will be assigned.
phenotypes_df (pandas.DataFrame) –
A DataFrame containing phenotype definitions with columns: - “phenotype_name” : str
The name of the phenotype.
- ”phenotype_code”str
The code used to decode the phenotype.
annotation (str, optional) – The name of the column to store the combined phenotype. Default is “manual_phenotype”.
prefix (str, optional) – Prefix to be added to the column names. Default is ‘’.
suffix (str, optional) – Suffix to be added to the column names. Default is ‘’.
multiple (bool, optional) – Whether to concatenate the names of multiple positive phenotypes. Default is True.
drop_binary_code (bool, optional) – Whether to drop the binary phenotype columns. Default is True.
- Returns:
A dictionary with the following keys: - “phenotypes_counts”: dict
Counts of cells matching each defined phenotype.
- ”assigned_phenotype_counts”: dict
Counts of cells matching different numbers of phenotypes.
- ”multiple_phenotypes_summary”: pandas.DataFrame
Summary of cells with multiple phenotypes.
- Return type:
dict
Notes
The function generates a combined phenotype column, prints summaries of cells matching multiple phenotypes, and returns a dictionary with detailed counts and summaries.
Examples
Suppose data_df is a DataFrame with binary phenotype columns and phenotypes_df contains the following definitions:
>>> data_df = pd.DataFrame({ ... 'cd4_phenotype': [0, 1, 0, 1], ... 'cd8_phenotype': [0, 0, 1, 1] ... }) >>> phenotypes_df = pd.DataFrame([ ... {"phenotype_name": "cd4_cells", "phenotype_code": "cd4+"}, ... {"phenotype_name": "cd8_cells", "phenotype_code": "cd8+"}, ... {"phenotype_name": "cd4_cd8", "phenotype_code": "cd4+cd8+"} ... ]) >>> result = assign_manual_phenotypes( ... data_df, ... phenotypes_df, ... annotation="manual", ... prefix='', ... suffix='_phenotype', ... multiple=True ... )
The data_df DataFrame will be edited in place to include a new column “manual” with the combined phenotype labels:
>>> print(data_df) cd4_phenotype cd8_phenotype manual 0 0 0 no_label 1 1 0 cd4_cells 2 0 1 cd8_cells 3 1 1 cd8_cells, cd4_cd8
The result dictionary contains counts and summaries as follows:
>>> print(result["phenotypes_counts"]) {'cd4_cells': 1, 'cd8_cells': 2, 'cd4_cd8': 1}
>>> print(result["assigned_phenotype_counts"]) 0 1 1 2 2 1 Name: num_phenotypes, dtype: int64
>>> print(result["multiple_phenotypes_summary"]) manual count 0 cd8_cells, cd4_cd8 1
- spac.phenotyping.combine_phenotypes(data_df, phenotype_columns, multiple=True)[source]
Combine multiple binary phenotype columns into a new column in a vectorized manner.
- Parameters:
data_df (pandas.DataFrame) – DataFrame containing the phenotype columns.
phenotype_columns (list of str) – List of binary phenotype column names.
multiple (bool, optional) – Whether to concatenate the names of multiple positive phenotypes. If False, all multiple positive phenotypes are labeled as “no_label”. Default is True.
- Returns:
A Series representing the combined phenotype for each row.
- Return type:
pandas.Series
- spac.phenotyping.decode_phenotype(data, phenotype_code, **kwargs)[source]
Convert a phenotype code into a dictionary mapping feature (marker) names to values for that marker’s classification as ‘+’ or ‘-‘.
- Parameters:
data (pandas.DataFrame) – The DataFrame containing the columns that will be used to decode the phenotype.
phenotype_code (str) – The phenotype code string, which should end with ‘+’ or ‘-‘.
**kwargs (keyword arguments) –
Optional keyword arguments to specify prefix and suffix to be added to the column names. - prefix : str, optional
Prefix to be added to the column names for the feature classification. Default is ‘’.
- suffixstr, optional
Suffix to be added to the column names for the feature classification. Default is ‘’.
- Returns:
A dictionary where the keys are column names and the values are the corresponding phenotype classification.
- Return type:
dict
- Raises:
ValueError – If the phenotype code does not end with ‘+’ or ‘-’ or if any columns specified in the phenotype code do not exist in the DataFrame.
Notes
The function splits the phenotype code on ‘+’ and ‘-’ characters to determine the phenotype columns and values. It checks if the columns exist in the DataFrame and whether they are binary or string types to properly map values.
- spac.phenotyping.generate_phenotypes_dict(data_df, phenotypes_df, prefix='', suffix='')[source]
Generate a dictionary of phenotype names to their corresponding decoding rules.
- Parameters:
data_df (pandas.DataFrame) – The DataFrame containing the columns that will be used to decode the phenotypes.
phenotypes_df (pandas.DataFrame) –
A DataFrame containing phenotype definitions with columns: - “phenotype_name” : str
The name of the phenotype.
- ”phenotype_code”str
The code used to decode the phenotype.
prefix (str, optional) – Prefix to be added to the column names. Default is ‘’.
suffix (str, optional) – Suffix to be added to the column names. Default is ‘’.
- Returns:
A dictionary where the keys are phenotype names and the values are dictionaries mapping column names to values.
- Return type:
dict
Notes
The function iterates over each row in the phenotypes_df DataFrame and decodes the phenotype using the decode_phenotype function.
- spac.phenotyping.is_binary_0_1(column)[source]
Check if a pandas Series contains only binary values (0 and 1).
- Parameters:
column (pandas.Series) – The pandas Series to check.
- Returns:
True if the Series contains only 0 and 1, False otherwise.
- Return type:
bool
Notes
The function considers a Series to be binary if it contains exactly the values 0 and 1, and no other values.
spac.spatial_analysis module
- spac.spatial_analysis.calculate_nearest_neighbor(adata, annotation, spatial_associated_table='spatial', imageid=None, label='spatial_distance', verbose=True)[source]
Computes the shortest distance from each cell to the nearest cell of each phenotype (via scimap.tl.spatial_distance) and stores the resulting DataFrame in adata.obsm[label].
- Parameters:
adata (anndata.AnnData) – Annotated data matrix with spatial information.
annotation (str) – Column name in adata.obs containing cell annotationsi (i.e. phenotypes).
spatial_associated_table (str, optional) – Key in adata.obsm where spatial coordinates are stored. Default is ‘spatial’.
imageid (str, optional) – The column in adata.obs specifying image IDs. If None, a dummy image column is created temporarily. Spatial distances are computed across the entire dataseti as if it’s one image.
label (str, optional) – The key under which results are stored in adata.obsm. Default is ‘spatial_distance’.
verbose (bool, optional) – If True, prints progress messages. Default is True.
- Returns:
Modifies adata in place by storing a DataFrame of spatial distances in adata.obsm[label].
- Return type:
None
Example
For a dataset with two cells (CellA, CellB) both of the same phenotype “type1”, the output might look like:
>>> adata.obsm['spatial_distance'] type1 CellA 0.0 CellB 0.0
For a dataset with two phenotypes “type1” and “type2”, the output might look like:
>>> adata.obsm['spatial_distance'] type1 type2 CellA 0.00 1.414214 CellB 1.414214 0.00
- Input:
- adata.obs:
cell_type imageid type1 image1 type1 image1 type2 image1
- adata.obsm[‘spatial’]:
[[0.0, 0.0], [1.0, 1.0], [2.0, 2.0]]
- Output stored in adata.obsm[‘spatial_distance’]:
type1 type2
0 0.0 1.414 1 1.414 0.0 2 2.236 1.0
- Raises:
ValueError – If spatial_associated_table is not found in adata.obsm. If spatial coordinates are missing or invalid.
- spac.spatial_analysis.neighborhood_profile(adata, phenotypes, distances, regions=None, spatial_key='spatial', normalize=None, associated_table_name='neighborhood_profile')[source]
Calculate the neighborhood profile for every cell in all slides in an analysis and update the input AnnData object in place.
- Parameters:
adata (AnnData) – The AnnData object containing the spatial coordinates and phenotypes.
phenotypes (str) – The name of the column in adata.obs that contains the phenotypes.
distances (list) – The list of increasing distances for the neighborhood profile.
spatial_key (str, optional) – The key in adata.obs that contains the spatial coordinates. Default is ‘spatial’.
normalize (str or None, optional) – If ‘total_cells’, normalize the neighborhood profile based on the total number of cells in each bin. If ‘bin_area’, normalize the neighborhood profile based on the area of every bin. Default is None.
associated_table_name (str, optional) – The name of the column in adata.obsm that will contain the neighborhood profile. Default is ‘neighborhood_profile’.
regions (str or None, optional) – The name of the column in adata.obs that contains the regions. If None, all cells in adata will be used. Default is None.
- Returns:
The function modifies the input AnnData object in place, adding a new column containing the neighborhood profile to adata.obsm.
- Return type:
None
Notes
The input AnnData object ‘adata’ is modified in place. The function adds a new column containing the neighborhood profile to adata.obsm, named by the parameter ‘associated_table_name’. The associated_table_name is a 3D array of shape (n_cells, n_phenotypes, n_bins) where n_cells is the number of cells in the all slides, n_phenotypes is the number of unique phenotypes, and n_bins is the number of bins in the distances list.
A dictionary is added to adata.uns[associated_table_name] with the two keys “bins” and “labels”. “labels” will store all the values in the phenotype annotation.
- spac.spatial_analysis.ripley_l(adata, annotation, phenotypes, distances, regions=None, spatial_key='spatial', n_simulations=1, area=None, seed=42)[source]
Calculate Ripley’s L statistic for spatial data in adata.
Ripley’s L statistic is a spatial point pattern analysis metric that quantifies clustering or regularity in point patterns across various distances. This function calculates the statistic for each region in adata (if provided) or for all cells if regions are not specified.
- Parameters:
adata (AnnData) – The annotated data matrix containing the spatial coordinates and cell phenotypes.
annotation (str) – The key in adata.obs representing the annotation for cell phenotypes.
phenotypes (list of str) – A list containing two phenotypes for which the Ripley L statistic will be calculated. If the two phenotypes are the same, the calculation is done for the same type; if different, it considers interactions between the two.
distances (array-like) – An array of distances at which to calculate Ripley’s L statistic. The values must be positive and incremental.
regions (str or None, optional) – The key in adata.obs representing regions for stratifying the data. If None, all cells will be treated as one region.
spatial_key (str, optional) – The key in adata.obsm representing the spatial coordinates. Default is “spatial”.
n_simulations (int, optional) – Number of simulations to perform for significance testing. Default is 100.
area (float or None, optional) – The area of the spatial region of interest. If None, the area will be inferred from the data. Default is None.
seed (int, optional) – Random seed for simulation reproducibility. Default is 42.
- Returns:
A DataFrame containing the Ripley’s L results for each region or the entire dataset if regions is None. The DataFrame includes the following columns: - region: The region label or ‘all’ if no regions are specified. - center_phenotype: The first phenotype in phenotypes. - neighbor_phenotype: The second phenotype in phenotypes. - ripley_l: The Ripley’s L statistic calculated for the region. - config: A dictionary with configuration settings used for the calculation.
- Return type:
pd.DataFrame
Notes
Ripley’s L is an adjusted version of Ripley’s K that corrects for the inherent increase in point-to-point distances as the distance grows. This statistic is used to evaluate spatial clustering or dispersion of points (cells) in biological datasets.
The function uses pre-defined distances and performs simulations to assess the significance of observed patterns. The results are stored in the .uns attribute of adata under the key ‘ripley_l’, or in a new DataFrame if no prior results exist.
Examples
Calculate Ripley’s L for two phenotypes in a single region dataset:
>>> result = ripley_l(adata, annotation='cell_type', phenotypes=['A', 'B'], distances=np.linspace(0, 500, 100))
Calculate Ripley’s L for multiple regions in adata:
>>> result = ripley_l(adata, annotation='cell_type', phenotypes=['A', 'B'], distances=np.linspace(0, 500, 100), regions='region_key')
- spac.spatial_analysis.spatial_interaction(adata, annotation, analysis_method, stratify_by=None, ax=None, return_matrix=False, seed=None, coord_type=None, n_rings=1, n_neighs=6, radius=None, cmap='seismic', **kwargs)[source]
Perform spatial analysis on the selected annotation in the dataset. Current analysis methods are provided in squidpy:
Neighborhood Enrichment, Cluster Interaction Matrix
- Parameters:
adata (anndata.AnnData) – The AnnData object.
annotation (str) – The column name of the annotation (e.g., phenotypes) to analyze in the provided dataset.
analysis_method (str) – The analysis method to use, currently available: “Neighborhood Enrichment” and “Cluster Interaction Matrix”.
stratify_by (str or list of strs) – The annotation[s] to stratify the dataset when generating interaction plots. If single annotation is passed, the dataset will be stratified by the unique labels in the annotation column. If n (n>=2) annotations are passed, the function will be stratified based on existing combination of labels in the passed annotations.
ax (matplotlib.axes.Axes, default None) – The matplotlib Axes to display the image. This option is only available when stratify is None.
return_matrix (boolean, default False) – If true, the fucntion will return a list of two dictionaries, the first contains axes and the second containing computed matrix. Note that for Neighborhood Encrichment, the matrix will be a tuple with the z-score and the enrichment count. For Cluster Interaction Matrix, it will returns the interaction matrix. If False, the function will return only the axes dictionary.
seed (int, default None) – Random seed for reproducibility, used in Neighborhood Enrichment Analysis.
coord_type (str, optional) – Type of coordinate system used in sq.gr.spatial_neighbors. Should be either ‘grid’ (Visium Data) or ‘generic’ (Others). Default is None, decided by the squidy pacakge. If spatial_key is in anndata.uns the coord_type would be ‘grid’, otherwise general.
n_rings (int, default 1) – Number of rings of neighbors for grid data. Only used when coord_type = ‘grid’ (Visium)
n_neights (int, optional) – Default is 6. Depending on the
coord_type
: - ‘grid’ (Visium) - number of neighboring tiles. - ‘generic’ - number of neighborhoods for non-grid data.radius (float, optional) –
Default is None. Only available when coord_type = ‘generic’. Depending on the type: -
float
- compute the graph based on neighborhood radius. -tuple
- prune the final graph to only containedges in interval [min(radius), max(radius)].
cmap (str, default 'seismic') – The colormap to use for the plot. The ‘seismic’ color map consist of three color regions: red for positive, blue for negative, and the white at the center. This color map effectively represents the nature of the spatial interaction analysis results, where positive values indicate clustering and negative values indicate seperation. For more color maps, please visit https://matplotlib.org/stable/tutorials/colors/colormaps.html
**kwargs – Keyword arguments for matplotlib.pyplot.text()
- Returns:
A dictionary containing the results of the spatial interaction analysis. The keys of the dictionary depend on the parameters passed to the function:
- Axdict or matplotlib.axes.Axes
If stratify_by is not used, returns a single matplotlib.axes.Axes object. If stratify_by is used, returns a dictionary of Axes objects, with keys representing the stratification groups.
- Matrixdict, optional
Contains processed DataFrames of computed matrices with row and column labels applied. If stratify_by is used, the keys represent the stratification groups. For example: - results[‘Matrix’][‘GroupA’] for a specific stratification group. - If stratify_by is not used, the table is accessible via results[‘Matrix’][‘annotation’].
- Return type:
dict
spac.transformations module
- spac.transformations.apply_per_batch(data, annotation, method, **kwargs)[source]
Apply a given function to data per batch, with additional parameters.
- Parameters:
data (np.ndarray) – The data to transform.
annotation (np.ndarray) – Batch annotations for each row in the data.
method (str) – The function to apply to each batch. Options: ‘arcsinh_transformation’ or ‘normalize_features’.
kwargs – Additional parameters to pass to the function.
- Returns:
The transformed data.
- Return type:
np.ndarray
- spac.transformations.arcsinh_transformation(adata, input_layer=None, co_factor=None, percentile=None, output_layer='arcsinh', per_batch=False, annotation=None)[source]
Apply arcsinh transformation using a co-factor (fixed number) or a given percentile of each feature. This transformation can be applied to the entire dataset or per batch based on provided parameters.
Computes the co-factor or percentile for each biomarker individually, ensuring proper scaling based on its unique range of expression levels.
- Parameters:
adata (anndata.AnnData) – The AnnData object containing the data to transform.
input_layer (str, optional) – The name of the layer in the AnnData object to transform. If None, the main data matrix .X is used.
co_factor (float, optional) – A fixed positive number to use as a co-factor for the transformation.
percentile (float, optional) – The percentile is computed for each feature (column) individually.
output_layer (str, default="arcsinh") – Name of the layer to put the transformed results. If it already exists, it will be overwritten with a warning.
per_batch (bool, optional, default=False) – Whether to apply the transformation per batch.
annotation (str, optional) – The name of the annotation in adata to define batches. Required if per_batch is True.
- Returns:
adata – The AnnData object with the transformed data stored in the specified output_layer.
- Return type:
anndata.AnnData
- spac.transformations.arcsinh_transformation_core(data, co_factor=None, percentile=None)[source]
Apply arcsinh transformation using a co-factori or a percentile.
- Parameters:
data (np.ndarray) – The data to transform.
co_factor (float, optional) – A fixed positive number to use as a co-factor for the transformation.
percentile (float, optional) – The percentile to determine the co-factor if co_factor is not provided. The percentile is computed for each feature (column) individually.
- Returns:
The transformed data.
- Return type:
np.ndarray
- Raises:
ValueError – If both co_factor and percentile are None. If both co_factor and percentile are specified. If percentile is not in the range [0, 100].
- spac.transformations.batch_normalize(adata, annotation, output_layer, input_layer=None, method='median', log=False)[source]
Adjust the features of every marker using a normalization method.
The normalization methods are summarized here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8723144/
- Parameters:
adata (anndata.AnnData) – The AnnData object.
annotation (str) – The name of the annotation in adata to define batches.
output_layer (str) – The name of the new layer to add to the anndata object for storing normalized data.
input_layer (str, optional) – The name of the layer from which to read data. If None, read from .X.
method ({"median", "Q50", "Q75", "z-score"}, default "median") – The normalization method to use.
log (bool, default False) – If True, take the log2 of features before normalization. Ensure this is boolean.
- spac.transformations.get_cluster_info(adata, annotation, features=None, layer=None)[source]
Retrieve information about clusters based on specific annotation.
- Parameters:
adata (anndata.AnnData) – The AnnData object.
annotation (str) – Annotation in adata.obs for cluster info.
features (list of str, optional) – Features (e.g., markers) for cluster metrics. Defaults to all features in adata.var_names.
layer (str, optional) – The layer to be used in the aggregate summaries. If None, uses adata.X.
- Returns:
DataFrame with metrics for each cluster including the percentage of each cluster to the whole sample.
- Return type:
pd.DataFrame
- spac.transformations.normalize_features(adata, low_quantile=0.02, high_quantile=0.98, interpolation='linear', input_layer=None, output_layer='normalized_feature', per_batch=False, annotation=None)[source]
Normalize the features stored in an AnnData object. Any entry lower than the value corresponding to low_quantile of the column will be assigned a value of low_quantile, and entry that are greater than high_quantile value will be assigned as the value of high_quantile. Other entries will be normalized with (values - quantile min)/(quantile max - quantile min). Resulting column will have value ranged between [0, 1].
- spac.transformations.normalize_features_core(data, low_quantile=0.02, high_quantile=0.98, interpolation='linear')[source]
Normalize the features in a numpy array.
Any entry lower than the value corresponding to low_quantile of the column will be assigned a value of low_quantile, and entries that are greater than high_quantile value will be assigned as value of high_quantile. Other entries will be normalized with (values - quantile min)/(quantile max - quantile min). Resulting column will have values ranged between [0, 1].
- Parameters:
data (np.ndarray) – The data to be normalized.
low_quantile (float, optional (default: 0.02)) – The lower quantile to use for normalization. Determines the minimum value after normalization. Must be a positive float between [0,1).
high_quantile (float, optional (default: 0.98)) – The higher quantile to use for normalization. Determines the maximum value after normalization. Must be a positive float between (0,1].
interpolation (str, optional (default: "linear")) – The interpolation method to use when selecting the value for low and high quantile. Values can be “nearest” or “linear”.
- Returns:
The normalized data.
- Return type:
np.ndarray
- Raises:
TypeError – If low_quantile or high_quantile are not numeric.
ValueError – If low_quantile is not less than high_quantile, or if they are out of the range [0, 1] and (0, 1], respectively.
ValueError – If interpolation is not one of the allowed values.
- spac.transformations.phenograph_clustering(adata, features, layer=None, k=50, seed=None, output_annotation='phenograph', associated_table=None, **kwargs)[source]
Calculate automatic phenotypes using phenograph.
The function will add these two attributes to adata: .obs[“phenograph”]
The assigned int64 class by phenograph
- .uns[“phenograph_features”]
The features used to calculate the phenograph clusters
- Parameters:
adata (anndata.AnnData) – The AnnData object.
features (list of str) – The variables that would be included in creating the phenograph clusters.
layer (str, optional) – The layer to be used in calculating the phengraph clusters.
k (int, optional) – The number of nearest neighbor to be used in creating the graph.
seed (int, optional) – Random seed for reproducibility.
output_annotation (str, optional) – The name of the output layer where the clusters are stored.
associated_table (str, optional) – If set, use the corresponding key adata.obsm to calcuate the Phenograph. Takes priority over the layer argument.
- Returns:
adata – Updated AnnData object with the phenograph clusters stored in adata.obs[output_annotation]
- Return type:
anndata.AnnData
- spac.transformations.rename_annotations(adata, src_annotation, dest_annotation, mappings)[source]
Rename labels in a given annotation in an AnnData object based on a provided dictionary. This function modifies the adata object in-place and creates a new annotation column.
- Parameters:
adata (anndata.AnnData) – The AnnData object.
src_annotation (str) – Name of the column in adata.obs containing the original labels of the source annotation.
dest_annotation (str) – The name of the new column to be created in the AnnData object containing the renamed labels.
mappings (dict) – A dictionary mapping the original annotation labels to the new labels.
Examples
>>> adata = your_anndata_object >>> src_annotation = "phenograph" >>> mappings = { ... "0": "group_8", ... "1": "group_2", ... "2": "group_6", ... # ... ... "37": "group_5", ... } >>> dest_annotation = "renamed_annotations" >>> adata = rename_annotations( ... adata, src_annotation, dest_annotation, mappings)
- spac.transformations.run_umap(adata, n_neighbors=75, min_dist=0.1, n_components=2, metric='euclidean', random_state=0, transform_seed=42, layer=None, output_derived_feature='X_umap', associated_table=None, **kwargs)[source]
Perform UMAP analysis on the specific layer of the AnnData object or the default data.
- Parameters:
adata (AnnData) – Annotated data matrix.
n_neighbors (int, default=75) – Number of neighbors to consider when constructing the UMAP. This influences the balance between preserving local and global structures in the data.
min_dist (float, default=0.1) – Minimum distance between points in the UMAP space. Controls how tightly the embedding is allowed to compress points together.
n_components (int, default=2) – Number of dimensions for embedding.
metric (str, optional) – Metric to compute distances in high dimensional space. Check https://umap-learn.readthedocs.io/en/latest/api.html for options. The default is ‘euclidean’.
random_state (int, default=0) – Seed used by the random number generator(RNG) during UMAP fitting.
transform_seed (int, default=42) – RNG seed during UMAP transformation.
layer (str, optional) – Layer of AnnData object for UMAP. Defaults to adata.X.
output_derived_feature (str, default='X_umap') – The name of the column in adata.obsm that will contain the umap coordinates.
associated_table (str, optional) – If set, use the corresponding key adata.obsm to calcuate the UMAP. Takes priority over the layer argument.
- Returns:
adata – Updated AnnData object with UMAP coordinates stored in the obsm attribute. The key for the UMAP embedding in obsm is “X_umap” by default.
- Return type:
anndata.AnnData
- spac.transformations.run_utag_clustering(adata, features=None, k=15, resolution=1, max_dist=20, n_pcs=10, random_state=42, n_jobs=1, n_iterations=5, slide_key='Slide', layer=None, output_annotation='UTAG', associated_table=None, parallel=False, **kwargs)[source]
Run UTAG clustering on the AnnData object.
- Parameters:
adata (anndata.AnnData) – The AnnData object.
features (list) – List of features to use for clustering or for PCA. Default (None) is to use all.
k (int) – The number of nearest neighbor to be used in creating the graph. Default is 15.
resolution (float) – Resolution parameter for the clustering, higher resolution produces more clusters. Default is 1.
max_dist (float) – Maximum distance to cut edges within a graph. Default is 20.
n_principal_components (int) – Number of principal components to use for clustering.
random_state (int) – Random state for reproducibility.
n_jobs (int) – Number of jobs to run in parallel. Default is 5.
n_iterations (int) – Number of iterations for the clustering.
slide_key (str) – Key of adata.obs containing information on the batch structure of the data.In general, for image data this will often be a variable indicating the imageb so image-specific effects are removed from data. Default is “Slide”.
- Returns:
adata – Updated AnnData object with clustering results.
- Return type:
anndata.AnnData
- spac.transformations.tsne(adata, layer=None, **kwargs)[source]
Perform t-SNE transformation on specific layer information.
- Parameters:
adata (anndata.AnnData) – The AnnData object.
layer (str) – Layer for phenograph cluster calculation.
**kwargs – Parameters for scanpy.tl.tsne function.
- Returns:
adata – Updated AnnData object with t-SNE coordinates.
- Return type:
anndata.AnnData
- spac.transformations.z_score_normalization(adata, output_layer, input_layer=None, **kwargs)[source]
Compute z-scores for the provided AnnData object.
- Parameters:
adata (anndata.AnnData) – The AnnData object containing the data to normalize.
output_layer (str) – The name of the layer to store the computed z-scores.
input_layer (str, optional) – The name of the layer in the AnnData object to normalize. If None, the main data matrix .X is used.
**kwargs (dict, optional) – Additional arguments to pass to scipy.stats.zscore.
spac.utag_functions module
- spac.utag_functions.add_probabilities_to_centroid(adata, col: str, name_to_output: str | None = None)[source]
- spac.utag_functions.sparse_matrix_dstack(matrices: Sequence[csr_matrix]) csr_matrix [source]
Diagonally stack sparse matrices.
- spac.utag_functions.utag(adata, channels_to_use=None, slide_key='Slide', save_key: str = 'UTAG Label', filter_by_variance: bool = False, max_dist: float = 20.0, normalization_mode: str = 'l1_norm', keep_spatial_connectivity: bool = False, n_pcs=10, apply_umap: bool = False, umap_kwargs: Dict[str, Any] = {}, apply_clustering: bool = True, clustering_method: Sequence[str] = ['leiden'], resolutions: Sequence[float] = [0.05, 0.1, 0.3, 1.0], leiden_kwargs: Dict[str, Any] | None = None, parc_kwargs: Dict[str, Any] | None = None, parallel: bool = False, processes: int = 1, k=15, random_state=42)[source]
Discover tissue architechture in single-cell imaging data by combining phenotypes and positional information of cells.
- Parameters:
adata (AnnData) – AnnData object with spatial positioning of cells in obsm ‘spatial’ slot.
channels_to_use (Optional[Sequence[str]]) – An optional sequence of strings used to subset variables to use. Default (None) is to use all variables.
max_dist (float) – Maximum distance to cut edges within a graph. Should be adjusted depending on resolution of images. For imaging mass cytometry, where resolution is 1um, 20 often gives good results. Default is 20.
slide_key ({str, None}) – Key of adata.obs containing information on the batch structure of the data. In general, for image data this will often be a variable indicating the image so image-specific effects are removed from data. Default is “Slide”.
save_key (str) – Key to be added to adata object holding the UTAG clusters. Depending on the values of clustering_method and resolutions, the final keys will be of the form: {save_key}_{method}_{resolution}”. Default is “UTAG Label”.
filter_by_variance (bool) – Whether to filter vairiables by variance. Default is False, which keeps all variables.
max_dist – Recommended values are between 20 to 50 depending on magnification. Default is 20.
normalization_mode (str) – Method to normalize adjacency matrix. Default is “l1_norm”, any other value will not use normalization.
keep_spatial_connectivity (bool) – Whether to keep sparse matrices of spatial connectivity and distance in the obsp attribute of the resulting anndata object. This could be useful in downstream applications. Default is not to (False).
n_pcs (Number of principal components to use for clustering. Default is 10.)
None (If)
features. (no PCA is performed and clustering is done on)
apply_umap (bool) – Whether to build a UMAP representation after message passing. Default is False.
umap_kwargs (Dict[str, Any]) – Keyword arguments to be passed to scanpy.tl.umap for dimensionality reduction after message passing. Default is 10.0.
apply_clustering (bool) – Whether to cluster the message passed matrix. Default is True.
clustering_method (Sequence[str]) – Which clustering method(s) to use for clustering of the message passed matrix. Default is [“leiden”].
resolutions (Sequence[float]) – What resolutions should the methods in clustering_method be run at. Default is [0.05, 0.1, 0.3, 1.0].
leiden_kwargs (dict[str, Any]) – Keyword arguments to pass to scanpy.tl.leiden.
parc_kwargs (dict[str, Any]) – Keyword arguments to pass to parc.PARC.
parallel (bool) – Whether to run message passing part of algorithm in parallel. Will accelerate the process but consume more memory. Default is True.
processes (int) – Number of processes to use in parallel. Default is to use all available (-1).
- Returns:
adata – AnnData object with UTAG domain predictions for each cell in adata.obs, column save_key.
- Return type:
AnnData
spac.utils module
- spac.utils.annotation_category_relations(adata, source_annotation, target_annotation, prefix=False)[source]
Calculates the count of unique relationships between two annotations in an AnnData object. Relationship is defined as a unique pair of values, one from the ‘source_annotation’ and one from the ‘target_annotation’.
Returns a DataFrame with columns ‘source_annotation’, ‘target_annotation’, ‘count’, ‘percentage_source’, and ‘percentage_target’. Where ‘count’ represents the number of occurrences of each relationship, percentage_source represents the percentage of the count of link over the total count of the source label, and percentage_target represents the percentage of the count of link over the total count of the target.
If the prefix is set to True, it appends “source_” and “target_” prefixes to labels in the “source” and “target” columns, respectively.
- Parameters:
adata (AnnData) – The annotated data matrix of shape n_obs * n_vars. Rows correspond to cells and columns to genes.
source_annotation (str) – The name of the source annotation column in the adata object.
target_annotation (str) – The name of the target annotation column in the adata object.
prefix (bool, optional) – If True, appends “source_” and “target_” prefixes to the “source” and “target” columns, respectively.
- Returns:
relationships – A DataFrame with the source and target categories, their counts and their percentages.
- Return type:
pandas.DataFrame
- spac.utils.check_annotation(adata, annotations=None, parameter_name=None, should_exist=True)[source]
Perform common error checks for annotations in anndata related objects.
- Parameters:
adata (anndata.AnnData) – The AnnData object to be checked.
annotations (str or list of str, optional) – The annotation(s) to check for existence in adata.obs.
should_exist (bool, optional (default=True)) – Determines whether to check if elements exist in the target list (True), or if they should not exist (False).
- Raises:
TypeError – If adata is not an instance of anndata.AnnData.
ValueError – If any of the specified layers, annotations, or features do not exist.
- spac.utils.check_distances(distances)[source]
Check that the distances are valid: must be an array-like of incremental positive values.
- Parameters:
distances (list, tuple, or np.ndarray) – The list of increasing distances for the neighborhood profile.
- Returns:
Raises a ValueError or TypeError if the distances are invalid.
- Return type:
None
Notes
The distances must be a list of positive real numbers and must be monotonically increasing.
- spac.utils.check_feature(adata, features=None, should_exist=True)[source]
Perform common error checks for features in anndata related objects.
- Parameters:
adata (anndata.AnnData) – The AnnData object to be checked.
features (str or list of str, optional) – The feature(s) to check for existence in adata.var_names.
should_exist (bool, optional (default=True)) – Determines whether to check if elements exist in the target list (True), or if they should not exist (False).
- Raises:
TypeError – If adata is not an instance of anndata.AnnData.
ValueError – If any of the specified layers, annotations, or features do not exist.
- spac.utils.check_label(adata, annotation, labels=None, should_exist=True, warning=False)[source]
Check if specified labels exist in a given annotation column in adata.obs.
This function verifies whether all or none of the specified labels exist in the provided annotation column of an AnnData object. It ensures that the input labels align with the expected categories present in adata.obs[annotation].
- Parameters:
adata (anndata.AnnData) – The AnnData object containing the annotation column.
annotation (str) – The name of the annotation column in adata.obs to check against.
labels (str or list of str, optional) – The label or list of labels to check for existence in the specified annotation column. If None, no validation will be performed.
should_exist (bool, optional (default=True)) – Determines whether to check if elements exist in the target column (True), or if they should not exist (False).
warning (bool, optional (default=False)) – If True, generate a warning instead of raising an exception if the specified condition is not met.
- Raises:
TypeError – If adata is not an instance of anndata.AnnData.
ValueError – If the specified annotation does not exist in adata.obs. If should_exist is True and any label does not exist in the annotation column. If should_exist is False and any label already exists in the annotation column.
- Warns:
UserWarning – If the specified condition is not met and warning is True.
Example
>>> check_label(adata, "cell_type", "B_cell") >>> check_label( ... adata, "cluster", ["Cluster1", "Cluster2"], should_exist=True ... )
- spac.utils.check_list_in_list(input, input_name, input_type, target_list, need_exist=True, warning=False)[source]
Check if items in a given list exist in a target list.
This function is used to validate whether all or none of the items in a given list exist in a target list. It helps to ensure that the input list contains only valid elements that are present in the target list.
- Parameters:
input (str or list of str or None) – The input list or a single string element. If it is a string, it will be converted to a list containing only that string element. If None, no validation will be performed.
input_name (str) – The name of the input list used for displaying helpful error messages.
input_type (str) – The type of items in the input list (e.g., “item”, “element”, “category”).
target_list (list of str) – The target list containing valid items that the input list elements should be compared against.
need_exist (bool, optional (default=True)) –
- Determines whether to check if elements exist in the
target list (True), or if they should not exist (False).
- warning: bool, optional (default=False)
If true, generate a warning instead of raising an exception
- Raises:
ValueError – If the input is not a string or a list of strings. If need_exist is True and any element of the input list does not exist in the target list. If need_exist is False and any element of the input list exists in the target list.
- Warns:
UserWarning – If the specified behavior is not present and warning is True.
- spac.utils.check_table(adata, tables=None, should_exist=True, associated_table=False, warning=False)[source]
Perform common error checks for table (layers) or derived tables (obsm) in anndata related objects.
- Parameters:
adata (anndata.AnnData) – The AnnData object to be checked.
tables (str or list of str, optional) – The term “table” is equivalent to layer in anndata structure. The layer(s) to check for existence in adata.layers.keys().
should_exist (bool, optional (default=True)) – Determines whether to check if elements exist in the target list (True), or if they should not exist (False).
associtated_table (bool, optional (default=False)) – Determines whether to check if the passed tables names should exist as layers or in obsm in the andata object.
warning (bool, optional (default=False)) – If True, generate a warning instead of raising an exception.
- Raises:
TypeError – If adata is not an instance of anndata.AnnData.
ValueError – If any of the specified layers, annotations, obsm, or features do not exist.
- Warns:
UserWarning – If any of the specified layers, annotations, obsm, or features do not exist, and warning is True.
- spac.utils.color_mapping(labels, color_map='viridis', opacity=1.0, rgba_mode=True, return_dict=False)[source]
Map a list of labels to colors using a Matplotlib colormap and opacity.
This function assigns a unique color to each label in the provided list using a specified colormap from Matplotlib. The generated colors can be returned in either rgba or rgb format, suitable for visualization in libraries like Plotly.
The function supports both continuous and discrete colormaps: - Continuous colormaps interpolate smoothly between colors across a range. - Discrete colormaps have a fixed number of distinct colors, and labels are
distributed evenly across these colors.
Opacity can be set with a value between 0 (fully transparent) and 1 (fully opaque). The resulting colors are CSS-compatible strings.
- Parameters:
labels (list) – A list of unique labels to map to colors. The number of labels determines how the colormap is sampled.
color_map (str, optional) – The colormap name (e.g., ‘viridis’, ‘plasma’, ‘inferno’). It must be a valid Matplotlib colormap. Default is ‘viridis’.
opacity (float, optional) – Opacity (alpha channel) for colors, between 0 (transparent) and 1 (opaque). Default is 1.0.
rgba_mode (bool, optional) – If True, returns colors in rgba format (e.g., rgba(255, 0, 0, 0.5)). If False, returns rgb format (e.g., rgb(255, 0, 0)). Default is True.
return_dict (bool, optional) – If True, returns a dictionary where keys are labels, and values are the corresponding colors. Default is False.
- Returns:
label_colors – If return_dict is False, returns a list of color strings, one for each label. If return_dict is True, returns a dictionary with label keys and color values. The format of the colors depends on the rgba_mode parameter.
- Return type:
list[str] or dict
- Raises:
ValueError –
If opacity is not in the range [0, 1]. - If color_map is not a valid Matplotlib colormap name.
Examples
Assign colors to labels with default settings:
>>> labels = ['A', 'B', 'C'] >>> color_mapping(labels) ['rgba(68, 1, 84, 1.0)', 'rgba(58, 82, 139, 1.0)', 'rgba(33, 145, 140, 1.0)']
Use a different colormap with reduced opacity:
>>> color_mapping(labels, color_map='plasma', opacity=0.5) ['rgba(13, 8, 135, 0.5)', 'rgba(126, 3, 167, 0.5)', 'rgba(240, 249, 33, 0.5)']
Generate colors in rgb format:
>>> color_mapping(labels, rgba_mode=False) ['rgb(68, 1, 84)', 'rgb(58, 82, 139)', 'rgb(33, 145, 140)']
Return a dictionary of labels and colors:
>>> color_mapping(labels, return_dict=True) {'A': 'rgba(68, 1, 84, 1.0)', 'B': 'rgba(58, 82, 139, 1.0)', 'C': 'rgba(33, 145, 140, 1.0)'}
Notes
Continuous colormaps interpolate colors evenly based on the number of labels.
Discrete colormaps divide labels evenly across available colors.
For more information on Matplotlib colormaps: https://matplotlib.org/stable/users/explain/colors/colormaps.html
- spac.utils.regex_search_list(regex_patterns, list_to_search)[source]
Perfrom regex (regular expression) search in a list and return list of strings matching the regex pattern
- Parameters:
regex_pattern (str or a list of str) – The regex pattern to look for, single pattern or a list of patterns.
list_to_search (list of str) – A list of string to seach for string that matches regex pattern.
- Returns:
A list of strings containing results from search.
- Return type:
list of str
Example
>>> regex_pattern = ["A", "^B.*"] >>> list_to_search = ["ABC", "BC", "AC", "AB"] >>> result = regex_search_list(regex_pattern, list_to_search) >>> print(result) ['BC']
- spac.utils.spell_out_special_characters(text)[source]
Convert special characters in a string to comply with NIDAP naming rules.
This function processes a string by replacing or removing disallowed characters to ensure compatibility with NIDAP. Spaces, special symbols, and certain substrings are replaced or transformed into readable and regulation-compliant equivalents.
- Parameters:
text (str) – The input string to be processed and converted.
- Returns:
str – A sanitized string with special characters replaced or removed, adhering to NIDAP naming conventions.
Processing Steps
—————-
Spaces are replaced with underscores (_).
2. Substrings related to units (e.g., ‘µm²’) are replaced with text – equivalents: - ‘µm²’ -> ‘um2’ - ‘µm’ -> ‘um’
Hyphens (-) between letters are replaced with underscores (_).
4. Certain special symbols are mapped to readable equivalents –
+ -> _pos_
- -> _neg_
@ -> at
# -> hash
& -> and
And more (see Notes section for a full mapping).
5. Remaining disallowed characters are removed (non-alphanumeric and – non-underscore characters).
6. Consecutive underscores are consolidated into a single underscore.
7. Leading and trailing underscores are stripped.
Notes
The following special character mappings are used: - µ -> u - ² -> 2 - / -> slash - = -> equals - ! -> exclamation - | -> pipe - For a complete list, refer to the special_char_map in the code.
Example
>>> spell_out_special_characters("Data µm²+Analysis #1-2") 'Data_um2_pos_Analysis_hash1_neg_2'
>>> spell_out_special_characters("Invalid!Char@Format") 'Invalid_exclamation_Char_at_Format'
spac.visualization module
- spac.visualization.boxplot(adata, annotation=None, second_annotation=None, layer=None, ax=None, features=None, log_scale=False, **kwargs)[source]
Create a boxplot visualization of the features in the passed adata object. This function offers flexibility in how the boxplots are displayed, based on the arguments provided.
- Parameters:
adata (anndata.AnnData) – The AnnData object.
annotation (str, optional) – Annotation to determine if separate plots are needed for every label.
second_annotation (str, optional) – Second annotation to further divide the data.
layer (str, optional) – The name of the matrix layer to use. If not provided, uses the main data matrix adata.X.
ax (matplotlib.axes.Axes, optional) – An existing Axes object to draw the plot onto, optional.
features (list, optional) – List of feature names to be plotted. If not provided, all features will be plotted.
log_scale (bool, optional) – If True, the Y-axis will be in log scale. Default is False.
**kwargs –
Additional arguments to pass to seaborn.boxplot. Key arguments include: - orient: Determines the orientation of the plot. * “v”: Vertical orientation (default). In this case, categorical data
will be plotted on the x-axis, and the boxplots will be vertical.
- ”h”: Horizontal orientation. Categorical data will be plotted on the
y-axis, and the boxplots will be horizontal.
- Returns:
fig, ax – The created figure and axes for the plot.
- Return type:
matplotlib.figure.Figure, matplotlib.axes.Axes
Examples
Multiple features boxplot: boxplot(adata, features=[‘GeneA’,’GeneB’])
Boxplot grouped by a single annotation: boxplot(adata, features=[‘GeneA’], annotation=’cell_type’)
Boxplot for multiple features grouped by a single annotation: boxplot(adata, features=[‘GeneA’, ‘GeneB’], annotation=’cell_type’)
Nested grouping by two annotations: boxplot(adata, features=[‘GeneA’], annotation=’cell_type’, second_annotation=’treatment’)
- spac.visualization.dimensionality_reduction_plot(adata, method=None, annotation=None, feature=None, layer=None, ax=None, associated_table=None, **kwargs)[source]
Visualize scatter plot in PCA, t-SNE, UMAP, or associated table.
- Parameters:
adata (anndata.AnnData) – The AnnData object with coordinates precomputed by the ‘tsne’ or ‘UMAP’ function and stored in ‘adata.obsm[“X_tsne”]’ or ‘adata.obsm[“X_umap”]’
method (str, optional (default: None)) – Dimensionality reduction method to visualize. Choose from {‘tsne’, ‘umap’, ‘pca’}.
annotation (str, optional) – The name of the column in adata.obs to use for coloring the scatter plot points based on cell annotations.
feature (str, optional) – The name of the gene or feature in adata.var_names to use for coloring the scatter plot points based on feature expression.
layer (str, optional) – The name of the data layer in adata.layers to use for visualization. If None, the main data matrix adata.X is used.
ax (matplotlib.axes.Axes, optional (default: None)) – A matplotlib axes object to plot on. If not provided, a new figure and axes will be created.
associated_table (str, optional (default: None)) – Name of the key in obsm that contains the numpy array. Takes precedence over method
**kwargs – Parameters passed to visualize_2D_scatter function, including point_size.
- Returns:
fig (matplotlib.figure.Figure) – The created figure for the plot.
ax (matplotlib.axes.Axes) – The axes of the plot.
- spac.visualization.heatmap(adata, column, layer=None, **kwargs)[source]
Plot the heatmap of the mean feature of cells that belong to a column.
- Parameters:
adata (anndata.AnnData) – The AnnData object.
column (str) – Name of member of adata.obs to plot the histogram.
layer (str, default None) – The name of the adata layer to use to calculate the mean feature.
**kwargs – Parameters passed to seaborn heatmap function.
- Returns:
pandas.DataFrame – A dataframe tha has the labels as indexes the mean feature for every marker.
matplotlib.figure.Figure – The figure of the heatmap.
matplotlib.axes._subplots.AxesSubplot – The AsxesSubplot of the heatmap.
- spac.visualization.hierarchical_heatmap(adata, annotation, features=None, layer=None, cluster_feature=False, cluster_annotations=False, standard_scale=None, z_score='annotation', swap_axes=False, rotate_label=False, **kwargs)[source]
Generates a hierarchical clustering heatmap and dendrogram. By default, the dataset is assumed to have features as columns and annotations as rows. Cells are grouped by annotation (e.g., phenotype), and for each group, the average expression intensity of each feature (e.g., protein or marker) is computed. The heatmap is plotted using seaborn’s clustermap.
- Parameters:
adata (anndata.AnnData) – The AnnData object.
annotation (str) – Name of the annotation in adata.obs to group by and calculate mean intensity.
features (list or None, optional) – List of feature names (e.g., markers) to be included in the visualization. If None, all features are used. Default is None.
layer (str, optional) – The name of the adata layer to use to calculate the mean intensity. If not provided, uses the main matrix. Default is None.
cluster_feature (bool, optional) – If True, perform hierarchical clustering on the feature axis. Default is False.
cluster_annotations (bool, optional) – If True, perform hierarchical clustering on the annotations axis. Default is False.
standard_scale (int or None, optional) – Whether to standard scale data (0: row-wise or 1: column-wise). Default is None.
z_score (str, optional) – Specifies the axis for z-score normalization. Can be “feature” or “annotation”. Default is “annotation”.
swap_axes (bool, optional) – If True, switches the axes of the heatmap, effectively transposing the dataset. By default (when False), annotations are on the vertical axis (rows) and features are on the horizontal axis (columns). When set to True, features will be on the vertical axis and annotations on the horizontal axis. Default is False.
rotate_label (bool, optional) – If True, rotate x-axis labels by 45 degrees. Default is False.
**kwargs –
Additional parameters passed to sns.clustermap function or its underlying functions. Some essential parameters include: - cmap : colormap
Colormap to use for the heatmap. It’s an argument for the underlying sns.heatmap() used within sns.clustermap(). Examples include “viridis”, “plasma”, “coolwarm”, etc.
{row,col}_colors : Lists or DataFrames Colors to use for annotating the rows/columns. Useful for visualizing additional categorical information alongside the main heatmap.
{dendrogram,colors}_ratio : tuple(float) Control the size proportions of the dendrogram and the color labels relative to the main heatmap.
cbar_pos : tuple(float) or None Specify the position and size of the colorbar in the figure. If set to None, no colorbar will be added.
tree_kws : dict Customize the appearance of the dendrogram tree. Passes additional keyword arguments to the underlying matplotlib.collections.LineCollection.
method : str The linkage algorithm to use for the hierarchical clustering. Defaults to ‘centroid’ in the function, but can be changed.
metric : str The distance metric to use for the hierarchy. Defaults to ‘euclidean’ in the function.
- Returns:
mean_intensity (pandas.DataFrame) – A DataFrame containing the mean intensity of cells for each annotation.
clustergrid (seaborn.matrix.ClusterGrid) – The seaborn ClusterGrid object representing the heatmap and dendrograms.
dendrogram_data (dict) – A dictionary containing hierarchical clustering linkage data for both rows and columns. These linkage matrices can be used to generate dendrograms with tools like scipy’s dendrogram function. This offers flexibility in customizing and plotting dendrograms as needed.
Examples
import matplotlib.pyplot as plt import pandas as pd import anndata from spac.visualization import hierarchical_heatmap X = pd.DataFrame([[1, 2], [3, 4]], columns=[‘gene1’, ‘gene2’]) annotation = pd.DataFrame([‘type1’, ‘type2’], columns=[‘cell_type’]) all_data = anndata.AnnData(X=X, obs=annotation)
- mean_intensity, clustergrid, dendrogram_data = hierarchical_heatmap(
all_data, “cell_type”, layer=None, z_score=”annotation”, swap_axes=True, cluster_feature=False, cluster_annotations=True
)
# To display a standalone dendrogram using the returned linkage matrix: import scipy.cluster.hierarchy as sch import numpy as np import matplotlib.pyplot as plt
# Convert the linkage data to type double dendro_col_data = np.array(dendrogram_data[‘col_linkage’], dtype=np.double)
# Ensure the linkage matrix has at least two dimensions and more than one linkage if dendro_col_data.ndim == 2 and dendro_col_data.shape[0] > 1:
fig, ax = plt.subplots(figsize=(10, 7)) sch.dendrogram(dendro_col_data, ax=ax) plt.title(‘Standalone Col Dendrogram’) plt.show()
- else:
print(“Insufficient data to plot a dendrogram.”)
- spac.visualization.histogram(adata, feature=None, annotation=None, layer=None, group_by=None, together=False, ax=None, x_log_scale=False, y_log_scale=False, **kwargs)[source]
Plot the histogram of cells based on a specific feature from adata.X or annotation from adata.obs.
- Parameters:
adata (anndata.AnnData) – The AnnData object.
feature (str, optional) – Name of continuous feature from adata.X to plot its histogram.
annotation (str, optional) – Name of the annotation from adata.obs to plot its histogram.
layer (str, optional) – Name of the layer in adata.layers to plot its histogram.
group_by (str, default None) – Choose either to group the histogram by another column.
together (bool, default False) – If True, and if group_by != None, create one plot combining all groups. If False, create separate histograms for each group. The appearance of combined histograms can be controlled using the multiple and element parameters in **kwargs. To control how the histograms are normalized (e.g., to divide the histogram by the number of elements in every group), use the stat parameter in **kwargs. For example, set stat=”probability” to show the relative frequencies of each group.
ax (matplotlib.axes.Axes, optional) – An existing Axes object to draw the plot onto, optional.
x_log_scale (bool, default False) – If True, the data will be transformed using np.log1p before plotting, and the x-axis label will be adjusted accordingly.
y_log_scale (bool, default False) – If True, the y-axis will be set to log scale.
**kwargs –
Additional keyword arguments passed to seaborn histplot function. Key arguments include: - multiple: Determines how the subsets of data are displayed
- on the same axes. Options include:
- ”layer”: Draws each subset on top of the other
without adjustments.
”dodge”: Dodges bars for each subset side by side.
”stack”: Stacks bars for each subset on top of each other.
”fill”: Adjusts bar heights to fill the axes.
- element: Determines the visual representation of the bins.
- Options include:
”bars”: Displays the typical bar-style histogram (default).
”step”: Creates a step line plot without bars.
- ”poly”: Creates a polygon where the bottom edge represents
the x-axis and the top edge the histogram’s bins.
- log_scale: Determines if the data should be plotted on
a logarithmic scale.
- stat: Determines the statistical transformation to use on the data
- for the histogram. Options include:
”count”: Show the counts of observations in each bin.
”frequency”: Show the number of observations divided by the bin width.
- ”density”: Normalize such that the total area of the histogram
equals 1.
- ”probability”: Normalize such that each bar’s height reflects
the probability of observing that bin.
- bins: Specification of hist bins.
Can be a number (indicating the number of bins) or a list (indicating bin edges). For example, bins=10 will create 10 bins, while bins=[0, 1, 2, 3] will create bins [0,1), [1,2), [2,3]. If not provided, the binning will be determined automatically.
- Returns:
fig (matplotlib.figure.Figure) – The created figure for the plot.
axs (matplotlib.axes.Axes or list of Axes) – The Axes object(s) of the histogram plot(s). Returns a single Axes if only one plot is created, otherwise returns a list of Axes.
- spac.visualization.interative_spatial_plot(adata, annotations, dot_size=1.5, dot_transparancy=0.75, colorscale='viridis', figure_width=6, figure_height=4, figure_dpi=200, font_size=12, stratify_by=None, defined_color_map=None, **kwargs)[source]
Create an interactive scatter plot for spatial data using provided annotations.
- Parameters:
adata (AnnData) – Annotated data matrix object, must have a .obsm attribute with ‘spatial’ key.
annotations (list of str or str) – Column(s) in adata.obs that contain the annotations to plot. If a single string is provided, it will be converted to a list. The interactive plot will show all the labels in the annotation columns passed.
dot_size (float, optional) – Size of the scatter dots in the plot. Default is 1.5.
dot_transparancy (float, optional) – Transparancy level of the scatter dots. Default is 0.75.
colorscale (str, optional) – Name of the color scale to use for the dots. Default is ‘Viridis’.
figure_width (int, optional) – Width of the figure in inches. Default is 12.
figure_height (int, optional) – Height of the figure in inches. Default is 8.
figure_dpi (int, optional) – DPI (dots per inch) for the figure. Default is 200.
font_size (int, optional) – Font size for text in the plot. Default is 12.
stratify_by (str, optional) – Column in adata.obs to stratify the plot. Default is None.
defined_color_map (str, optional) – Predefined color mapping stored in adata.uns for specific labels. Default is None, which will generate the color mapping automatically.
**kwargs – Additional keyword arguments for customization.
- Returns:
A list of dictionaries, each containing the following keys: - “image_name”: str, the name of the generated image. - “image_object”: Plotly Figure object.
- Return type:
list of dict
Notes
This function is tailored for spatial single-cell data and expects the AnnData object to have spatial coordinates in its .obsm attribute under the ‘spatial’ key.
- spac.visualization.plot_ripley_l(adata, phenotypes, annotation=None, regions=None, sims=False, return_df=False, **kwargs)[source]
Plot Ripley’s L statistic for multiple bins and different regions for a given pair of phenotypes.
- Parameters:
adata (AnnData) – AnnData object containing Ripley’s L results in adata.uns[‘ripley_l’].
phenotypes (tuple of str) – A tuple of two phenotypes: (center_phenotype, neighbor_phenotype).
regions (list of str, optional) – A list of region labels to plot. If None, plot all available regions. Default is None.
sims (bool, optional) – Whether to plot the simulation results. Default is False.
return_df (bool, optional) – Whether to return the DataFrame containing the Ripley’s L results.
kwargs (dict, optional) – Additional keyword arguments to pass to seaborn.lineplot.
- Raises:
ValueError – If the Ripley L results are not found in adata.uns[‘ripley_l’].
- Returns:
ax (matplotlib.axes.Axes) – The Axes object containing the plot, which can be further modified.
df (pandas.DataFrame, optional) – The DataFrame containing the Ripley’s L results, if return_df is True.
Example
>>> ax = plot_ripley_l( ... adata, ... phenotypes=('Phenotype1', 'Phenotype2'), ... regions=['region1', 'region2']) >>> plt.show()
This returns the Axes object for further customization and displays the plot.
- spac.visualization.relational_heatmap(adata: AnnData, source_annotation: str, target_annotation: str, color_map: str = 'mint', **kwargs)[source]
Generates a relational heatmap from the given AnnData object. The color map refers to matplotlib color maps, default is mint. For more information on colormaps, see: https://matplotlib.org/stable/users/explain/colors/colormaps.html
- Parameters:
adata (anndata.AnnData) – The annotated data matrix.
source_annotation (str) – The source annotation to use for the relational heatmap.
target_annotation (str) – The target annotation to use for the relational heatmap.
color_map (str) – The color map to use for the relational heatmap. Default is mint.
**kwargs (dict, optional) – Additional keyword arguments. For example, you can pass font_size=12.0.
- Returns:
A dictionary containing: - “figure” (plotly.graph_objs._figure.Figure):
The generated relational heatmap as a Plotly figure.
- ”file_name” (str):
The name of the file where the relational matrix can be saved.
- ”data” (pandas.DataFrame):
A relational matrix DataFrame with percentage values. Rows represent source annotations, columns represent target annotations, and an additional “total” column sums the percentages for each source.
- Return type:
dict
- spac.visualization.sankey_plot(adata: AnnData, source_annotation: str, target_annotation: str, source_color_map: str = 'tab20', target_color_map: str = 'tab20c', sankey_font: float = 12.0, prefix: bool = True)[source]
Generates a Sankey plot from the given AnnData object. The color map refers to matplotlib color maps, default is tab20 for source annotation, and tab20c for target annotation. For more information on colormaps, see: https://matplotlib.org/stable/users/explain/colors/colormaps.html
- Parameters:
adata (anndata.AnnData) – The annotated data matrix.
source_annotation (str) – The source annotation to use for the Sankey plot.
target_annotation (str) – The target annotation to use for the Sankey plot.
source_color_map (str) – The color map to use for the source nodes. Default is tab20.
target_color_map (str) – The color map to use for the target nodes. Default is tab20c.
sankey_font (float, optional) – The font size to use for the Sankey plot. Defaults to 12.0.
prefix (bool, optional) – Whether to prefix the target labels with the source labels. Defaults to True.
- Returns:
The generated Sankey plot.
- Return type:
plotly.graph_objs._figure.Figure
- spac.visualization.spatial_plot(adata, spot_size, alpha, vmin=-999, vmax=-999, annotation=None, feature=None, layer=None, ax=None, **kwargs)[source]
Generate the spatial plot of selected features :param adata: The AnnData object containing target feature and spatial coordinates. :type adata: anndata.AnnData :param spot_size: The size of spot on the spatial plot. :type spot_size: int :param alpha: The transparency of spots, range from 0 (invisible) to 1 (solid) :type alpha: float :param vmin: The lower limit of the feature value for visualization :type vmin: float or int :param vmax: The upper limit of the feature value for visualization :type vmax: float or int :param feature: The feature to visualize on the spatial plot.
Default None.
- Parameters:
annotation (str) – The annotation to visualize in the spatial plot. Can’t be set with feature, default None.
layer (str) – Name of the AnnData object layer that wants to be plotted. By default adata.raw.X is plotted.
ax (matplotlib.axes.Axes) – The matplotlib Axes containing the analysis plots. The returned ax is the passed ax or new ax created. Only works if plotting a single component.
**kwargs – Arguments to pass to matplotlib.pyplot.scatter()
- Returns:
Single or a list of class
- Return type:
~matplotlib.axes.Axes.
- spac.visualization.threshold_heatmap(adata, feature_cutoffs, annotation, layer=None, swap_axes=False, **kwargs)[source]
Creates a heatmap for each feature, categorizing intensities into low, medium, and high based on provided cutoffs.
- Parameters:
adata (anndata.AnnData) – AnnData object containing the feature intensities in .X attribute or specified layer.
feature_cutoffs (dict) – Dictionary with feature names as keys and tuples with two intensity cutoffs as values.
annotation (str) – Column name in .obs DataFrame that contains the annotation used for grouping.
layer (str, optional) – Layer name in adata.layers to use for intensities. If None, uses .X attribute.
swap_axes (bool, optional) – If True, swaps the axes of the heatmap.
**kwargs (keyword arguments) – Additional keyword arguments to pass to scanpy’s heatmap function.
- Returns:
A dictionary contains the axes of figures generated in the scanpy heatmap function. Consistent Key: ‘heatmap_ax’ Potential Keys includes: ‘groupby_ax’, ‘dendrogram_ax’, and ‘gene_groups_ax’.
- Return type:
Dictionary of
Axes
- spac.visualization.tsne_plot(adata, color_column=None, ax=None, **kwargs)[source]
Visualize scatter plot in tSNE basis.
- Parameters:
adata (anndata.AnnData) – The AnnData object with t-SNE coordinates precomputed by the ‘tsne’ function and stored in ‘adata.obsm[“X_tsne”]’.
color_column (str, optional) – The name of the column to use for coloring the scatter plot points.
ax (matplotlib.axes.Axes, optional (default: None)) – A matplotlib axes object to plot on. If not provided, a new figure and axes will be created.
**kwargs – Parameters passed to scanpy.pl.tsne function.
- Returns:
fig (matplotlib.figure.Figure) – The created figure for the plot.
ax (matplotlib.axes.Axes) – The axes of the tsne plot.
- spac.visualization.visualize_2D_scatter(x, y, labels=None, point_size=None, theme=None, ax=None, annotate_centers=False, x_axis_title='Component 1', y_axis_title='Component 2', plot_title=None, color_representation=None, **kwargs)[source]
Visualize 2D data using plt.scatter.
- Parameters:
x (array-like) – Coordinates of the data.
y (array-like) – Coordinates of the data.
labels (array-like, optional) – Array of labels for the data points. Can be numerical or categorical.
point_size (float, optional) – Size of the points. If None, it will be automatically determined.
theme (str, optional) – Color theme for the plot. Defaults to ‘viridis’ if theme not recognized. For a list of supported themes, refer to Matplotlib’s colormap documentation: https://matplotlib.org/stable/tutorials/colors/colormaps.html
ax (matplotlib.axes.Axes, optional (default: None)) – Matplotlib axis object. If None, a new one is created.
annotate_centers (bool, optional (default: False)) – Annotate the centers of clusters if labels are categorical.
x_axis_title (str, optional) – Title for the x-axis.
y_axis_title (str, optional) – Title for the y-axis.
plot_title (str, optional) – Title for the plot.
color_representation (str, optional) – Description of what the colors represent.
**kwargs – Additional keyword arguments passed to plt.scatter.
- Returns:
fig (matplotlib.figure.Figure) – The figure of the plot.
ax (matplotlib.axes.Axes) – The axes of the plot.
- spac.visualization.visualize_nearest_neighbor(adata, annotation, stratify_by=None, spatial_distance='spatial_distance', distance_from=None, distance_to=None, facet_plot=False, plot_type=None, log=False, method=None, **kwargs)[source]
Visualize nearest-neighbor (spatial distance) data between groups of cells as numeric or distribution plots.
This user-facing function assembles the data by calling _prepare_spatial_distance_data and then creates plots through _plot_spatial_distance_dispatch.
- Plot arrangement logic:
If stratify_by is not None and facet_plot=True => single figure with subplots (faceted).
If stratify_by is not None and facet_plot=False => multiple separate figures, one per group.
If stratify_by is None => a single figure with one plot.
- Parameters:
adata (anndata.AnnData) – Annotated data matrix with distances in adata.obsm[spatial_distance].
annotation (str) – Column in adata.obs containing cell phenotypes or annotations.
stratify_by (str, optional) – Column in adata.obs used to group or stratify data (e.g. imageid).
spatial_distance (str, optional) – Key in adata.obsm storing the distance DataFrame. Default is ‘spatial_distance’.
distance_from (str) – Reference phenotype from which distances are measured. Required.
distance_to (str or list of str, optional) – Target phenotype(s) to measure distance to. If None, uses all available phenotypes.
facet_plot (bool, optional) – If True (and stratify_by is not None), subplots in a single figure. Else, multiple or single figure(s).
plot_type (str, optional) – For method=’numeric’: ‘box’, ‘violin’, ‘boxen’, etc. For method=’distribution’: ‘hist’, ‘kde’, ‘ecdf’, etc.
log (bool, optional) – If True, applies np.log1p transform to the distance values.
method ({'numeric', 'distribution'}) – Determines the plotting style (catplot vs displot).
**kwargs (dict) – Additional arguments for seaborn figure-level functions.
- Returns:
- {
“data”: pd.DataFrame, # Tidy DataFrame used for plotting “fig”: Figure or list[Figure] # Single or multiple figures
}
- Return type:
dict
- Raises:
ValueError – If required parameters are missing or invalid.
Examples
>>> # Numeric box plot comparing Tumor distances to multiple targets >>> res = visualize_nearest_neighbor( ... adata=my_adata, ... annotation='cell_type', ... stratify_by='sample_id', ... spatial_distance='spatial_distance', ... distance_from='Tumor', ... distance_to=['Stroma', 'Immune'], ... facet_plot=True, ... plot_type='box', ... method='numeric' ... ) >>> df_long, fig = res["data"], res["fig"]
>>> # Distribution plot (kde) for a single target, single figure >>> res2 = visualize_nearest_neighbor( ... adata=my_adata, ... annotation='cell_type', ... distance_from='Tumor', ... distance_to='Stroma', ... method='distribution', ... plot_type='kde' ... ) >>> df_dist, fig2 = res2["data"], res2["fig"]