spac.transformations module

spac.transformations.apply_per_batch(data, annotation, method, **kwargs)[source]

Apply a given function to data per batch, with additional parameters.

Parameters:
  • data (np.ndarray) – The data to transform.

  • annotation (np.ndarray) – Batch annotations for each row in the data.

  • method (str) – The function to apply to each batch. Options: ‘arcsinh_transformation’ or ‘normalize_features’.

  • kwargs – Additional parameters to pass to the function.

Returns:

The transformed data.

Return type:

np.ndarray

spac.transformations.arcsinh_transformation(adata, input_layer=None, co_factor=None, percentile=None, output_layer='arcsinh', per_batch=False, annotation=None)[source]

Apply arcsinh transformation using a co-factor (fixed number) or a given percentile of each feature. This transformation can be applied to the entire dataset or per batch based on provided parameters.

Computes the co-factor or percentile for each biomarker individually, ensuring proper scaling based on its unique range of expression levels.

Parameters:
  • adata (anndata.AnnData) – The AnnData object containing the data to transform.

  • input_layer (str, optional) – The name of the layer in the AnnData object to transform. If None, the main data matrix .X is used.

  • co_factor (float, optional) – A fixed positive number to use as a co-factor for the transformation.

  • percentile (float, optional) – The percentile is computed for each feature (column) individually.

  • output_layer (str, default="arcsinh") – Name of the layer to put the transformed results. If it already exists, it will be overwritten with a warning.

  • per_batch (bool, optional, default=False) – Whether to apply the transformation per batch.

  • annotation (str, optional) – The name of the annotation in adata to define batches. Required if per_batch is True.

Returns:

adata – The AnnData object with the transformed data stored in the specified output_layer.

Return type:

anndata.AnnData

spac.transformations.arcsinh_transformation_core(data, co_factor=None, percentile=None)[source]

Apply arcsinh transformation using a co-factori or a percentile.

Parameters:
  • data (np.ndarray) – The data to transform.

  • co_factor (float, optional) – A fixed positive number to use as a co-factor for the transformation.

  • percentile (float, optional) – The percentile to determine the co-factor if co_factor is not provided. The percentile is computed for each feature (column) individually.

Returns:

The transformed data.

Return type:

np.ndarray

Raises:

ValueError – If both co_factor and percentile are None. If both co_factor and percentile are specified. If percentile is not in the range [0, 100].

spac.transformations.batch_normalize(adata, annotation, output_layer, input_layer=None, method='median', log=False)[source]

Adjust the features of every marker using a normalization method.

The normalization methods are summarized here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8723144/

Parameters:
  • adata (anndata.AnnData) – The AnnData object.

  • annotation (str) – The name of the annotation in adata to define batches.

  • output_layer (str) – The name of the new layer to add to the anndata object for storing normalized data.

  • input_layer (str, optional) – The name of the layer from which to read data. If None, read from .X.

  • method ({"median", "Q50", "Q75", "z-score"}, default "median") – The normalization method to use.

  • log (bool, default False) – If True, take the log2 of features before normalization. Ensure this is boolean.

spac.transformations.get_cluster_info(adata, annotation, features=None, layer=None)[source]

Retrieve information about clusters based on specific annotation.

Parameters:
  • adata (anndata.AnnData) – The AnnData object.

  • annotation (str) – Annotation in adata.obs for cluster info.

  • features (list of str, optional) – Features (e.g., markers) for cluster metrics. Defaults to all features in adata.var_names.

  • layer (str, optional) – The layer to be used in the aggregate summaries. If None, uses adata.X.

Returns:

DataFrame with metrics for each cluster including the percentage of each cluster to the whole sample.

Return type:

pd.DataFrame

spac.transformations.knn_clustering(adata, features, annotation, layer=None, k=50, output_annotation='knn', associated_table=None, missing_label='no_label', **kwargs)[source]

Calculate knn clusters using sklearn KNeighborsClassifier

The function will add these two attributes to adata: .obs[output_annotation]

The assigned int64 class labels by KNeighborsClassifier

.uns[output_annotation_features]

The features used to calculate the knn clusters

Parameters:
  • adata (anndata.AnnData) – The AnnData object.

  • features (list of str) – The variables that would be included in fitting the KNN classifier.

  • annotation (str) – The name of the annotation used for classifying the data

  • layer (str, optional) – The layer to be used.

  • k (int, optional) – The number of nearest neighbor to be used in creating the graph.

  • output_annotation (str, optional) – The name of the output layer where the clusters are stored.

  • associated_table (str, optional) – If set, use the corresponding key adata.obsm to calcuate the clustering. Takes priority over the layer argument.

  • missing_label (str or int) – The value of missing annotations in adata.obs[annotation]

Returns:

adata is updated inplace

Return type:

None

spac.transformations.normalize_features(adata, low_quantile=0.02, high_quantile=0.98, interpolation='linear', input_layer=None, output_layer='normalized_feature', per_batch=False, annotation=None)[source]

Normalize the features stored in an AnnData object. Any entry lower than the value corresponding to low_quantile of the column will be assigned a value of low_quantile, and entry that are greater than high_quantile value will be assigned as the value of high_quantile. Other entries will be normalized with (values - quantile min)/(quantile max - quantile min). Resulting column will have value ranged between [0, 1].

spac.transformations.normalize_features_core(data, low_quantile=0.02, high_quantile=0.98, interpolation='linear')[source]

Normalize the features in a numpy array.

Any entry lower than the value corresponding to low_quantile of the column will be assigned a value of low_quantile, and entries that are greater than high_quantile value will be assigned as value of high_quantile. Other entries will be normalized with (values - quantile min)/(quantile max - quantile min). Resulting column will have values ranged between [0, 1].

Parameters:
  • data (np.ndarray) – The data to be normalized.

  • low_quantile (float, optional (default: 0.02)) – The lower quantile to use for normalization. Determines the minimum value after normalization. Must be a positive float between [0,1).

  • high_quantile (float, optional (default: 0.98)) – The higher quantile to use for normalization. Determines the maximum value after normalization. Must be a positive float between (0,1].

  • interpolation (str, optional (default: "linear")) – The interpolation method to use when selecting the value for low and high quantile. Values can be “nearest” or “linear”.

Returns:

The normalized data.

Return type:

np.ndarray

Raises:
  • TypeError – If low_quantile or high_quantile are not numeric.

  • ValueError – If low_quantile is not less than high_quantile, or if they are out of the range [0, 1] and (0, 1], respectively.

  • ValueError – If interpolation is not one of the allowed values.

spac.transformations.phenograph_clustering(adata, features, layer=None, k=50, seed=None, output_annotation='phenograph', associated_table=None, **kwargs)[source]

Calculate automatic phenotypes using phenograph.

The function will add these two attributes to adata: .obs[“phenograph”]

The assigned int64 class by phenograph

.uns[“phenograph_features”]

The features used to calculate the phenograph clusters

Parameters:
  • adata (anndata.AnnData) – The AnnData object.

  • features (list of str) – The variables that would be included in creating the phenograph clusters.

  • layer (str, optional) – The layer to be used in calculating the phengraph clusters.

  • k (int, optional) – The number of nearest neighbor to be used in creating the graph.

  • seed (int, optional) – Random seed for reproducibility.

  • output_annotation (str, optional) – The name of the output layer where the clusters are stored.

  • associated_table (str, optional) – If set, use the corresponding key adata.obsm to calcuate the Phenograph. Takes priority over the layer argument.

Returns:

adata – Updated AnnData object with the phenograph clusters stored in adata.obs[output_annotation]

Return type:

anndata.AnnData

spac.transformations.rename_annotations(adata, src_annotation, dest_annotation, mappings)[source]

Rename labels in a given annotation in an AnnData object based on a provided dictionary. This function modifies the adata object in-place and creates a new annotation column.

Parameters:
  • adata (anndata.AnnData) – The AnnData object.

  • src_annotation (str) – Name of the column in adata.obs containing the original labels of the source annotation.

  • dest_annotation (str) – The name of the new column to be created in the AnnData object containing the renamed labels.

  • mappings (dict) – A dictionary mapping the original annotation labels to the new labels.

Examples

>>> adata = your_anndata_object
>>> src_annotation = "phenograph"
>>> mappings = {
...     "0": "group_8",
...     "1": "group_2",
...     "2": "group_6",
...     # ...
...     "37": "group_5",
... }
>>> dest_annotation = "renamed_annotations"
>>> adata = rename_annotations(
...     adata, src_annotation, dest_annotation, mappings)
spac.transformations.run_umap(adata, n_neighbors=75, min_dist=0.1, n_components=2, metric='euclidean', random_state=0, transform_seed=42, layer=None, output_derived_feature='X_umap', associated_table=None, **kwargs)[source]

Perform UMAP analysis on the specific layer of the AnnData object or the default data.

Parameters:
  • adata (AnnData) – Annotated data matrix.

  • n_neighbors (int, default=75) – Number of neighbors to consider when constructing the UMAP. This influences the balance between preserving local and global structures in the data.

  • min_dist (float, default=0.1) – Minimum distance between points in the UMAP space. Controls how tightly the embedding is allowed to compress points together.

  • n_components (int, default=2) – Number of dimensions for embedding.

  • metric (str, optional) – Metric to compute distances in high dimensional space. Check https://umap-learn.readthedocs.io/en/latest/api.html for options. The default is ‘euclidean’.

  • random_state (int, default=0) – Seed used by the random number generator(RNG) during UMAP fitting.

  • transform_seed (int, default=42) – RNG seed during UMAP transformation.

  • layer (str, optional) – Layer of AnnData object for UMAP. Defaults to adata.X.

  • output_derived_feature (str, default='X_umap') – The name of the column in adata.obsm that will contain the umap coordinates.

  • associated_table (str, optional) – If set, use the corresponding key adata.obsm to calcuate the UMAP. Takes priority over the layer argument.

Returns:

adata – Updated AnnData object with UMAP coordinates stored in the obsm attribute. The key for the UMAP embedding in obsm is “X_umap” by default.

Return type:

anndata.AnnData

spac.transformations.run_utag_clustering(adata, features=None, k=15, resolution=1, max_dist=20, n_pcs=10, random_state=42, n_jobs=1, n_iterations=5, slide_key='Slide', layer=None, output_annotation='UTAG', associated_table=None, parallel=False, **kwargs)[source]

Run UTAG clustering on the AnnData object.

Parameters:
  • adata (anndata.AnnData) – The AnnData object.

  • features (list) – List of features to use for clustering or for PCA. Default (None) is to use all.

  • k (int) – The number of nearest neighbor to be used in creating the graph. Default is 15.

  • resolution (float) – Resolution parameter for the clustering, higher resolution produces more clusters. Default is 1.

  • max_dist (float) – Maximum distance to cut edges within a graph. Default is 20.

  • n_principal_components (int) – Number of principal components to use for clustering.

  • random_state (int) – Random state for reproducibility.

  • n_jobs (int) – Number of jobs to run in parallel. Default is 5.

  • n_iterations (int) – Number of iterations for the clustering.

  • slide_key (str) – Key of adata.obs containing information on the batch structure of the data.In general, for image data this will often be a variable indicating the imageb so image-specific effects are removed from data. Default is “Slide”.

Returns:

adata – Updated AnnData object with clustering results.

Return type:

anndata.AnnData

spac.transformations.tsne(adata, layer=None, **kwargs)[source]

Perform t-SNE transformation on specific layer information.

Parameters:
  • adata (anndata.AnnData) – The AnnData object.

  • layer (str) – Layer for phenograph cluster calculation.

  • **kwargs – Parameters for scanpy.tl.tsne function.

Returns:

adata – Updated AnnData object with t-SNE coordinates.

Return type:

anndata.AnnData

spac.transformations.z_score_normalization(adata, output_layer, input_layer=None, **kwargs)[source]

Compute z-scores for the provided AnnData object.

Parameters:
  • adata (anndata.AnnData) – The AnnData object containing the data to normalize.

  • output_layer (str) – The name of the layer to store the computed z-scores.

  • input_layer (str, optional) – The name of the layer in the AnnData object to normalize. If None, the main data matrix .X is used.

  • **kwargs (dict, optional) – Additional arguments to pass to scipy.stats.zscore.

Functions