spac.transformations module
- spac.transformations.apply_per_batch(data, annotation, method, **kwargs)[source]
Apply a given function to data per batch, with additional parameters.
- Parameters:
data (np.ndarray) – The data to transform.
annotation (np.ndarray) – Batch annotations for each row in the data.
method (str) – The function to apply to each batch. Options: ‘arcsinh_transformation’ or ‘normalize_features’.
kwargs – Additional parameters to pass to the function.
- Returns:
The transformed data.
- Return type:
np.ndarray
- spac.transformations.arcsinh_transformation(adata, input_layer=None, co_factor=None, percentile=None, output_layer='arcsinh', per_batch=False, annotation=None)[source]
Apply arcsinh transformation using a co-factor (fixed number) or a given percentile of each feature. This transformation can be applied to the entire dataset or per batch based on provided parameters.
Computes the co-factor or percentile for each biomarker individually, ensuring proper scaling based on its unique range of expression levels.
- Parameters:
adata (anndata.AnnData) – The AnnData object containing the data to transform.
input_layer (str, optional) – The name of the layer in the AnnData object to transform. If None, the main data matrix .X is used.
co_factor (float, optional) – A fixed positive number to use as a co-factor for the transformation.
percentile (float, optional) – The percentile is computed for each feature (column) individually.
output_layer (str, default="arcsinh") – Name of the layer to put the transformed results. If it already exists, it will be overwritten with a warning.
per_batch (bool, optional, default=False) – Whether to apply the transformation per batch.
annotation (str, optional) – The name of the annotation in adata to define batches. Required if per_batch is True.
- Returns:
adata – The AnnData object with the transformed data stored in the specified output_layer.
- Return type:
anndata.AnnData
- spac.transformations.arcsinh_transformation_core(data, co_factor=None, percentile=None)[source]
Apply arcsinh transformation using a co-factori or a percentile.
- Parameters:
data (np.ndarray) – The data to transform.
co_factor (float, optional) – A fixed positive number to use as a co-factor for the transformation.
percentile (float, optional) – The percentile to determine the co-factor if co_factor is not provided. The percentile is computed for each feature (column) individually.
- Returns:
The transformed data.
- Return type:
np.ndarray
- Raises:
ValueError – If both co_factor and percentile are None. If both co_factor and percentile are specified. If percentile is not in the range [0, 100].
- spac.transformations.batch_normalize(adata, annotation, output_layer, input_layer=None, method='median', log=False)[source]
Adjust the features of every marker using a normalization method.
The normalization methods are summarized here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8723144/
- Parameters:
adata (anndata.AnnData) – The AnnData object.
annotation (str) – The name of the annotation in adata to define batches.
output_layer (str) – The name of the new layer to add to the anndata object for storing normalized data.
input_layer (str, optional) – The name of the layer from which to read data. If None, read from .X.
method ({"median", "Q50", "Q75", "z-score"}, default "median") – The normalization method to use.
log (bool, default False) – If True, take the log2 of features before normalization. Ensure this is boolean.
- spac.transformations.get_cluster_info(adata, annotation, features=None, layer=None)[source]
Retrieve information about clusters based on specific annotation.
- Parameters:
adata (anndata.AnnData) – The AnnData object.
annotation (str) – Annotation in adata.obs for cluster info.
features (list of str, optional) – Features (e.g., markers) for cluster metrics. Defaults to all features in adata.var_names.
layer (str, optional) – The layer to be used in the aggregate summaries. If None, uses adata.X.
- Returns:
DataFrame with metrics for each cluster including the percentage of each cluster to the whole sample.
- Return type:
pd.DataFrame
- spac.transformations.knn_clustering(adata, features, annotation, layer=None, k=50, output_annotation='knn', associated_table=None, missing_label='no_label', **kwargs)[source]
Calculate knn clusters using sklearn KNeighborsClassifier
The function will add these two attributes to adata: .obs[output_annotation]
The assigned int64 class labels by KNeighborsClassifier
- .uns[output_annotation_features]
The features used to calculate the knn clusters
- Parameters:
adata (anndata.AnnData) – The AnnData object.
features (list of str) – The variables that would be included in fitting the KNN classifier.
annotation (str) – The name of the annotation used for classifying the data
layer (str, optional) – The layer to be used.
k (int, optional) – The number of nearest neighbor to be used in creating the graph.
output_annotation (str, optional) – The name of the output layer where the clusters are stored.
associated_table (str, optional) – If set, use the corresponding key adata.obsm to calcuate the clustering. Takes priority over the layer argument.
missing_label (str or int) – The value of missing annotations in adata.obs[annotation]
- Returns:
adata is updated inplace
- Return type:
None
- spac.transformations.normalize_features(adata, low_quantile=0.02, high_quantile=0.98, interpolation='linear', input_layer=None, output_layer='normalized_feature', per_batch=False, annotation=None)[source]
Normalize the features stored in an AnnData object. Any entry lower than the value corresponding to low_quantile of the column will be assigned a value of low_quantile, and entry that are greater than high_quantile value will be assigned as the value of high_quantile. Other entries will be normalized with (values - quantile min)/(quantile max - quantile min). Resulting column will have value ranged between [0, 1].
- spac.transformations.normalize_features_core(data, low_quantile=0.02, high_quantile=0.98, interpolation='linear')[source]
Normalize the features in a numpy array.
Any entry lower than the value corresponding to low_quantile of the column will be assigned a value of low_quantile, and entries that are greater than high_quantile value will be assigned as value of high_quantile. Other entries will be normalized with (values - quantile min)/(quantile max - quantile min). Resulting column will have values ranged between [0, 1].
- Parameters:
data (np.ndarray) – The data to be normalized.
low_quantile (float, optional (default: 0.02)) – The lower quantile to use for normalization. Determines the minimum value after normalization. Must be a positive float between [0,1).
high_quantile (float, optional (default: 0.98)) – The higher quantile to use for normalization. Determines the maximum value after normalization. Must be a positive float between (0,1].
interpolation (str, optional (default: "linear")) – The interpolation method to use when selecting the value for low and high quantile. Values can be “nearest” or “linear”.
- Returns:
The normalized data.
- Return type:
np.ndarray
- Raises:
TypeError – If low_quantile or high_quantile are not numeric.
ValueError – If low_quantile is not less than high_quantile, or if they are out of the range [0, 1] and (0, 1], respectively.
ValueError – If interpolation is not one of the allowed values.
- spac.transformations.phenograph_clustering(adata, features, layer=None, k=50, seed=None, output_annotation='phenograph', associated_table=None, **kwargs)[source]
Calculate automatic phenotypes using phenograph.
The function will add these two attributes to adata: .obs[“phenograph”]
The assigned int64 class by phenograph
- .uns[“phenograph_features”]
The features used to calculate the phenograph clusters
- Parameters:
adata (anndata.AnnData) – The AnnData object.
features (list of str) – The variables that would be included in creating the phenograph clusters.
layer (str, optional) – The layer to be used in calculating the phengraph clusters.
k (int, optional) – The number of nearest neighbor to be used in creating the graph.
seed (int, optional) – Random seed for reproducibility.
output_annotation (str, optional) – The name of the output layer where the clusters are stored.
associated_table (str, optional) – If set, use the corresponding key adata.obsm to calcuate the Phenograph. Takes priority over the layer argument.
- Returns:
adata – Updated AnnData object with the phenograph clusters stored in adata.obs[output_annotation]
- Return type:
anndata.AnnData
- spac.transformations.rename_annotations(adata, src_annotation, dest_annotation, mappings)[source]
Rename labels in a given annotation in an AnnData object based on a provided dictionary. This function modifies the adata object in-place and creates a new annotation column.
- Parameters:
adata (anndata.AnnData) – The AnnData object.
src_annotation (str) – Name of the column in adata.obs containing the original labels of the source annotation.
dest_annotation (str) – The name of the new column to be created in the AnnData object containing the renamed labels.
mappings (dict) – A dictionary mapping the original annotation labels to the new labels.
Examples
>>> adata = your_anndata_object >>> src_annotation = "phenograph" >>> mappings = { ... "0": "group_8", ... "1": "group_2", ... "2": "group_6", ... # ... ... "37": "group_5", ... } >>> dest_annotation = "renamed_annotations" >>> adata = rename_annotations( ... adata, src_annotation, dest_annotation, mappings)
- spac.transformations.run_umap(adata, n_neighbors=75, min_dist=0.1, n_components=2, metric='euclidean', random_state=0, transform_seed=42, layer=None, output_derived_feature='X_umap', associated_table=None, **kwargs)[source]
Perform UMAP analysis on the specific layer of the AnnData object or the default data.
- Parameters:
adata (AnnData) – Annotated data matrix.
n_neighbors (int, default=75) – Number of neighbors to consider when constructing the UMAP. This influences the balance between preserving local and global structures in the data.
min_dist (float, default=0.1) – Minimum distance between points in the UMAP space. Controls how tightly the embedding is allowed to compress points together.
n_components (int, default=2) – Number of dimensions for embedding.
metric (str, optional) – Metric to compute distances in high dimensional space. Check https://umap-learn.readthedocs.io/en/latest/api.html for options. The default is ‘euclidean’.
random_state (int, default=0) – Seed used by the random number generator(RNG) during UMAP fitting.
transform_seed (int, default=42) – RNG seed during UMAP transformation.
layer (str, optional) – Layer of AnnData object for UMAP. Defaults to adata.X.
output_derived_feature (str, default='X_umap') – The name of the column in adata.obsm that will contain the umap coordinates.
associated_table (str, optional) – If set, use the corresponding key adata.obsm to calcuate the UMAP. Takes priority over the layer argument.
- Returns:
adata – Updated AnnData object with UMAP coordinates stored in the obsm attribute. The key for the UMAP embedding in obsm is “X_umap” by default.
- Return type:
anndata.AnnData
- spac.transformations.run_utag_clustering(adata, features=None, k=15, resolution=1, max_dist=20, n_pcs=10, random_state=42, n_jobs=1, n_iterations=5, slide_key='Slide', layer=None, output_annotation='UTAG', associated_table=None, parallel=False, **kwargs)[source]
Run UTAG clustering on the AnnData object.
- Parameters:
adata (anndata.AnnData) – The AnnData object.
features (list) – List of features to use for clustering or for PCA. Default (None) is to use all.
k (int) – The number of nearest neighbor to be used in creating the graph. Default is 15.
resolution (float) – Resolution parameter for the clustering, higher resolution produces more clusters. Default is 1.
max_dist (float) – Maximum distance to cut edges within a graph. Default is 20.
n_principal_components (int) – Number of principal components to use for clustering.
random_state (int) – Random state for reproducibility.
n_jobs (int) – Number of jobs to run in parallel. Default is 5.
n_iterations (int) – Number of iterations for the clustering.
slide_key (str) – Key of adata.obs containing information on the batch structure of the data.In general, for image data this will often be a variable indicating the imageb so image-specific effects are removed from data. Default is “Slide”.
- Returns:
adata – Updated AnnData object with clustering results.
- Return type:
anndata.AnnData
- spac.transformations.tsne(adata, layer=None, **kwargs)[source]
Perform t-SNE transformation on specific layer information.
- Parameters:
adata (anndata.AnnData) – The AnnData object.
layer (str) – Layer for phenograph cluster calculation.
**kwargs – Parameters for scanpy.tl.tsne function.
- Returns:
adata – Updated AnnData object with t-SNE coordinates.
- Return type:
anndata.AnnData
- spac.transformations.z_score_normalization(adata, output_layer, input_layer=None, **kwargs)[source]
Compute z-scores for the provided AnnData object.
- Parameters:
adata (anndata.AnnData) – The AnnData object containing the data to normalize.
output_layer (str) – The name of the layer to store the computed z-scores.
input_layer (str, optional) – The name of the layer in the AnnData object to normalize. If None, the main data matrix .X is used.
**kwargs (dict, optional) – Additional arguments to pass to scipy.stats.zscore.
Functions
- phenograph_clustering
- knn_clustering
- get_cluster_info
- tsne
- run_umap
- _validate_transformation_inputs
- _select_input_features
- batch_normalize
- rename_annotations
- normalize_features
- normalize_features_core
- arcsinh_transformation
- arcsinh_transformation_core
- z_score_normalization
- apply_per_batch
- run_utag_clustering