spac.utils module

spac.utils.annotation_category_relations(adata, source_annotation, target_annotation, prefix=False)[source]

Calculates the count of unique relationships between two annotations in an AnnData object. Relationship is defined as a unique pair of values, one from the ‘source_annotation’ and one from the ‘target_annotation’.

Returns a DataFrame with columns ‘source_annotation’, ‘target_annotation’, ‘count’, ‘percentage_source’, and ‘percentage_target’. Where ‘count’ represents the number of occurrences of each relationship, percentage_source represents the percentage of the count of link over the total count of the source label, and percentage_target represents the percentage of the count of link over the total count of the target.

If the prefix is set to True, it appends “source_” and “target_” prefixes to labels in the “source” and “target” columns, respectively.

Parameters:
  • adata (AnnData) – The annotated data matrix of shape n_obs * n_vars. Rows correspond to cells and columns to genes.

  • source_annotation (str) – The name of the source annotation column in the adata object.

  • target_annotation (str) – The name of the target annotation column in the adata object.

  • prefix (bool, optional) – If True, appends “source_” and “target_” prefixes to the “source” and “target” columns, respectively.

Returns:

relationships – A DataFrame with the source and target categories, their counts and their percentages.

Return type:

pandas.DataFrame

spac.utils.check_annotation(adata, annotations=None, parameter_name=None, should_exist=True)[source]

Perform common error checks for annotations in anndata related objects.

Parameters:
  • adata (anndata.AnnData) – The AnnData object to be checked.

  • annotations (str or list of str, optional) – The annotation(s) to check for existence in adata.obs.

  • should_exist (bool, optional (default=True)) – Determines whether to check if elements exist in the target list (True), or if they should not exist (False).

Raises:
  • TypeError – If adata is not an instance of anndata.AnnData.

  • ValueError – If any of the specified layers, annotations, or features do not exist.

spac.utils.check_column_name(column_name, field_name, symbol_checklist='!?,.')[source]
spac.utils.check_distances(distances)[source]

Check that the distances are valid: must be an array-like of incremental positive values.

Parameters:

distances (list, tuple, or np.ndarray) – The list of increasing distances for the neighborhood profile.

Returns:

Raises a ValueError or TypeError if the distances are invalid.

Return type:

None

Notes

The distances must be a list of positive real numbers and must be monotonically increasing.

spac.utils.check_feature(adata, features=None, should_exist=True)[source]

Perform common error checks for features in anndata related objects.

Parameters:
  • adata (anndata.AnnData) – The AnnData object to be checked.

  • features (str or list of str, optional) – The feature(s) to check for existence in adata.var_names.

  • should_exist (bool, optional (default=True)) – Determines whether to check if elements exist in the target list (True), or if they should not exist (False).

Raises:
  • TypeError – If adata is not an instance of anndata.AnnData.

  • ValueError – If any of the specified layers, annotations, or features do not exist.

spac.utils.check_label(adata, annotation, labels=None, should_exist=True, warning=False)[source]

Check if specified labels exist in a given annotation column in adata.obs.

This function verifies whether all or none of the specified labels exist in the provided annotation column of an AnnData object. It ensures that the input labels align with the expected categories present in adata.obs[annotation].

Parameters:
  • adata (anndata.AnnData) – The AnnData object containing the annotation column.

  • annotation (str) – The name of the annotation column in adata.obs to check against.

  • labels (str or list of str, optional) – The label or list of labels to check for existence in the specified annotation column. If None, no validation will be performed.

  • should_exist (bool, optional (default=True)) – Determines whether to check if elements exist in the target column (True), or if they should not exist (False).

  • warning (bool, optional (default=False)) – If True, generate a warning instead of raising an exception if the specified condition is not met.

Raises:
  • TypeError – If adata is not an instance of anndata.AnnData.

  • ValueError – If the specified annotation does not exist in adata.obs. If should_exist is True and any label does not exist in the annotation column. If should_exist is False and any label already exists in the annotation column.

Warns:

UserWarning – If the specified condition is not met and warning is True.

Example

>>> check_label(adata, "cell_type", "B_cell")
>>> check_label(
...     adata, "cluster", ["Cluster1", "Cluster2"], should_exist=True
... )
spac.utils.check_list_in_list(input, input_name, input_type, target_list, need_exist=True, warning=False)[source]

Check if items in a given list exist in a target list.

This function is used to validate whether all or none of the items in a given list exist in a target list. It helps to ensure that the input list contains only valid elements that are present in the target list.

Parameters:
  • input (str or list of str or None) – The input list or a single string element. If it is a string, it will be converted to a list containing only that string element. If None, no validation will be performed.

  • input_name (str) – The name of the input list used for displaying helpful error messages.

  • input_type (str) – The type of items in the input list (e.g., “item”, “element”, “category”).

  • target_list (list of str) – The target list containing valid items that the input list elements should be compared against.

  • need_exist (bool, optional (default=True)) –

    Determines whether to check if elements exist in the

    target list (True), or if they should not exist (False).

    warning: bool, optional (default=False)

    If true, generate a warning instead of raising an exception

Raises:

ValueError – If the input is not a string or a list of strings. If need_exist is True and any element of the input list does not exist in the target list. If need_exist is False and any element of the input list exists in the target list.

Warns:

UserWarning – If the specified behavior is not present and warning is True.

spac.utils.check_table(adata, tables=None, should_exist=True, associated_table=False, warning=False)[source]

Perform common error checks for table (layers) or derived tables (obsm) in anndata related objects.

Parameters:
  • adata (anndata.AnnData) – The AnnData object to be checked.

  • tables (str or list of str, optional) – The term “table” is equivalent to layer in anndata structure. The layer(s) to check for existence in adata.layers.keys().

  • should_exist (bool, optional (default=True)) – Determines whether to check if elements exist in the target list (True), or if they should not exist (False).

  • associtated_table (bool, optional (default=False)) – Determines whether to check if the passed tables names should exist as layers or in obsm in the andata object.

  • warning (bool, optional (default=False)) – If True, generate a warning instead of raising an exception.

Raises:
  • TypeError – If adata is not an instance of anndata.AnnData.

  • ValueError – If any of the specified layers, annotations, obsm, or features do not exist.

Warns:

UserWarning – If any of the specified layers, annotations, obsm, or features do not exist, and warning is True.

spac.utils.color_mapping(labels, color_map='viridis', opacity=1.0, rgba_mode=True, return_dict=False)[source]

Map a list of labels to colors using a Matplotlib colormap and opacity.

This function assigns a unique color to each label in the provided list using a specified colormap from Matplotlib. The generated colors can be returned in either rgba or rgb format, suitable for visualization in libraries like Plotly.

The function supports both continuous and discrete colormaps: - Continuous colormaps interpolate smoothly between colors across a range. - Discrete colormaps have a fixed number of distinct colors, and labels are

distributed evenly across these colors.

Opacity can be set with a value between 0 (fully transparent) and 1 (fully opaque). The resulting colors are CSS-compatible strings.

Parameters:
  • labels (list) – A list of unique labels to map to colors. The number of labels determines how the colormap is sampled.

  • color_map (str, optional) – The colormap name (e.g., ‘viridis’, ‘plasma’, ‘inferno’). It must be a valid Matplotlib colormap. Default is ‘viridis’.

  • opacity (float, optional) – Opacity (alpha channel) for colors, between 0 (transparent) and 1 (opaque). Default is 1.0.

  • rgba_mode (bool, optional) – If True, returns colors in rgba format (e.g., rgba(255, 0, 0, 0.5)). If False, returns rgb format (e.g., rgb(255, 0, 0)). Default is True.

  • return_dict (bool, optional) – If True, returns a dictionary where keys are labels, and values are the corresponding colors. Default is False.

Returns:

label_colors – If return_dict is False, returns a list of color strings, one for each label. If return_dict is True, returns a dictionary with label keys and color values. The format of the colors depends on the rgba_mode parameter.

Return type:

list[str] or dict

Raises:

ValueError

  • If opacity is not in the range [0, 1]. - If color_map is not a valid Matplotlib colormap name.

Examples

Assign colors to labels with default settings:

>>> labels = ['A', 'B', 'C']
>>> color_mapping(labels)
['rgba(68, 1, 84, 1.0)', 'rgba(58, 82, 139, 1.0)',
 'rgba(33, 145, 140, 1.0)']

Use a different colormap with reduced opacity:

>>> color_mapping(labels, color_map='plasma', opacity=0.5)
['rgba(13, 8, 135, 0.5)', 'rgba(126, 3, 167, 0.5)',
 'rgba(240, 249, 33, 0.5)']

Generate colors in rgb format:

>>> color_mapping(labels, rgba_mode=False)
['rgb(68, 1, 84)', 'rgb(58, 82, 139)', 'rgb(33, 145, 140)']

Return a dictionary of labels and colors:

>>> color_mapping(labels, return_dict=True)
{'A': 'rgba(68, 1, 84, 1.0)', 'B': 'rgba(58, 82, 139, 1.0)',
 'C': 'rgba(33, 145, 140, 1.0)'}

Notes

spac.utils.compute_boxplot_metrics(data: DataFrame, annotation=None, showfliers: bool | None = None)[source]

Compute boxplot-related statistical metrics for a given dataset efficiently.

Statistics include:
  • Lower and upper whiskers (whislo, whishi),

  • First quartile (q1),

  • Median (median),

  • Third quartile (q3),

  • Mean (mean)

  • Outliers (fliers) [If showfliers is not None]

It can identify outliers based on the ‘showfliers’ parameter, and supports efficient handling of large datasets by downsampling outliers when specified.

Parameters:
  • data (pd.DataFrame) – A pandas DataFrame containing the numerical data for which the boxplot statistics are to be computed.

  • annotation (str, optional:) – The annotation used to group the features

  • showfliers ({None, "downsample", "all"}, default = None) – Defines how outliers are handled If ‘all’, all outliers are displayed in the boxplot. If ‘downsample’, when num outliers is >10k, they are downsampled to 10% of the original count. If None, outliers are hidden.

Returns:

metrics – A dataframe with one row per feature/annotation grouping and columns representing the calculated features

Return type:

pd.DataFrame

spac.utils.get_defined_color_map(adata, defined_color_map=None, annotations=None, colorscale='viridis')[source]

Retrieve or generate a predefined color mapping dictionary from an AnnData object.

If defined_color_map is provided and found within adata.uns, the corresponding dictionary is returned. Otherwise, if it is not provided, a color mapping is generated using the unique values of the annotation column specified by annotations and the given colorscale.

Parameters:
  • adata (anndata.AnnData) – Annotated data matrix object that should contain a color mapping in its uns attribute if a predefined mapping is desired.

  • defined_color_map (str, optional) – The key in adata.uns that holds the predefined color mapping. If None, a new mapping is generated using annotations.

  • annotations (str, optional) – The annotation column name in adata.obs from which to obtain unique labels if defined_color_map is not provided.

  • colorscale (str, optional) – The Matplotlib colormap name to use when generating a color mapping if defined_color_map is not provided. Default is ‘viridis’.

Returns:

A dictionary mapping unique labels to colors.

Return type:

dict

Raises:
  • TypeError – If defined_color_map is provided but is not a string.

  • ValueError – If a predefined mapping is requested but not found, or if neither defined_color_map nor annotations is provided.

spac.utils.regex_search_list(regex_patterns, list_to_search)[source]

Perfrom regex (regular expression) search in a list and return list of strings matching the regex pattern

Parameters:
  • regex_pattern (str or a list of str) – The regex pattern to look for, single pattern or a list of patterns.

  • list_to_search (list of str) – A list of string to seach for string that matches regex pattern.

Returns:

A list of strings containing results from search.

Return type:

list of str

Example

>>> regex_pattern = ["A", "^B.*"]
>>> list_to_search = ["ABC", "BC", "AC", "AB"]
>>> result = regex_search_list(regex_pattern, list_to_search)
>>> print(result)
['BC']
spac.utils.spell_out_special_characters(text)[source]

Convert special characters in a string to comply with NIDAP naming rules.

This function processes a string by replacing or removing disallowed characters to ensure compatibility with NIDAP. Spaces, special symbols, and certain substrings are replaced or transformed into readable and regulation-compliant equivalents.

Parameters:

text (str) – The input string to be processed and converted.

Returns:

  • str – A sanitized string with special characters replaced or removed, adhering to NIDAP naming conventions.

  • Processing Steps

  • —————-

    1. Spaces are replaced with underscores (_).

  • 2. Substrings related to units (e.g., ‘µm²’) are replaced with text – equivalents: - ‘µm²’ -> ‘um2’ - ‘µm’ -> ‘um’

    1. Hyphens (-) between letters are replaced with underscores (_).

  • 4. Certain special symbols are mapped to readable equivalents

    • + -> _pos_

    • - -> _neg_

    • @ -> at

    • # -> hash

    • & -> and

    • And more (see Notes section for a full mapping).

  • 5. Remaining disallowed characters are removed (non-alphanumeric and – non-underscore characters).

  • 6. Consecutive underscores are consolidated into a single underscore.

  • 7. Leading and trailing underscores are stripped.

Notes

The following special character mappings are used: - µ -> u - ² -> 2 - / -> slash - = -> equals - ! -> exclamation - | -> pipe - For a complete list, refer to the special_char_map in the code.

Example

>>> spell_out_special_characters("Data µm²+Analysis #1-2")
'Data_um2_pos_Analysis_hash1_neg_2'
>>> spell_out_special_characters("Invalid!Char@Format")
'Invalid_exclamation_Char_at_Format'
spac.utils.text_to_others(parameter, text='None', to_None=True, to_False=False, to_True=False, to_Int=False, to_Float=False)[source]

Functions