spac.phenotyping module

spac.phenotyping.apply_phenotypes(data_df, phenotypes_dic)[source]

Add binary columns to the DataFrame indicating if each cell matches a phenotype.

Parameters:
  • data_df (pandas.DataFrame) – The DataFrame to which binary phenotype columns will be added.

  • phenotypes_dic (dict) – A dictionary where the keys are phenotype names and the values are dictionaries mapping column names to values.

Returns:

A dictionary where the keys are phenotype names and the values are the counts of rows that match each phenotype.

Return type:

dict

Notes

The function creates binary columns in the DataFrame for each phenotype and counts the number of rows matching each phenotype.

spac.phenotyping.assign_manual_phenotypes(data_df, phenotypes_df, annotation='manual_phenotype', prefix='', suffix='', multiple=True, drop_binary_code=True)[source]

Assign manual phenotypes to the DataFrame and generate summaries.

Parameters:
  • data_df (pandas.DataFrame) – The DataFrame to which manual phenotypes will be assigned.

  • phenotypes_df (pandas.DataFrame) –

    A DataFrame containing phenotype definitions with columns: - “phenotype_name” : str

    The name of the phenotype.

    • ”phenotype_code”str

      The code used to decode the phenotype.

  • annotation (str, optional) – The name of the column to store the combined phenotype. Default is “manual_phenotype”.

  • prefix (str, optional) – Prefix to be added to the column names. Default is ‘’.

  • suffix (str, optional) – Suffix to be added to the column names. Default is ‘’.

  • multiple (bool, optional) – Whether to concatenate the names of multiple positive phenotypes. Default is True.

  • drop_binary_code (bool, optional) – Whether to drop the binary phenotype columns. Default is True.

Returns:

A dictionary with the following keys: - “phenotypes_counts”: dict

Counts of cells matching each defined phenotype.

  • ”assigned_phenotype_counts”: dict

    Counts of cells matching different numbers of phenotypes.

  • ”multiple_phenotypes_summary”: pandas.DataFrame

    Summary of cells with multiple phenotypes.

Return type:

dict

Notes

The function generates a combined phenotype column, prints summaries of cells matching multiple phenotypes, and returns a dictionary with detailed counts and summaries.

Examples

Suppose data_df is a DataFrame with binary phenotype columns and phenotypes_df contains the following definitions:

>>> data_df = pd.DataFrame({
...     'cd4_phenotype': [0, 1, 0, 1],
...     'cd8_phenotype': [0, 0, 1, 1]
... })
>>> phenotypes_df = pd.DataFrame([
...     {"phenotype_name": "cd4_cells", "phenotype_code": "cd4+"},
...     {"phenotype_name": "cd8_cells", "phenotype_code": "cd8+"},
...     {"phenotype_name": "cd4_cd8", "phenotype_code": "cd4+cd8+"}
... ])
>>> result = assign_manual_phenotypes(
...     data_df,
...     phenotypes_df,
...     annotation="manual",
...     prefix='',
...     suffix='_phenotype',
...     multiple=True
... )

The data_df DataFrame will be edited in place to include a new column “manual” with the combined phenotype labels:

>>> print(data_df)
   cd4_phenotype  cd8_phenotype manual
0              0              0 no_label
1              1              0 cd4_cells
2              0              1 cd8_cells
3              1              1 cd8_cells, cd4_cd8

The result dictionary contains counts and summaries as follows:

>>> print(result["phenotypes_counts"])
{'cd4_cells': 1, 'cd8_cells': 2, 'cd4_cd8': 1}
>>> print(result["assigned_phenotype_counts"])
0    1
1    2
2    1
Name: num_phenotypes, dtype: int64
>>> print(result["multiple_phenotypes_summary"])
               manual  count
0  cd8_cells, cd4_cd8      1
spac.phenotyping.combine_phenotypes(data_df, phenotype_columns, multiple=True)[source]

Combine multiple binary phenotype columns into a new column in a vectorized manner.

Parameters:
  • data_df (pandas.DataFrame) – DataFrame containing the phenotype columns.

  • phenotype_columns (list of str) – List of binary phenotype column names.

  • multiple (bool, optional) – Whether to concatenate the names of multiple positive phenotypes. If False, all multiple positive phenotypes are labeled as “no_label”. Default is True.

Returns:

A Series representing the combined phenotype for each row.

Return type:

pandas.Series

spac.phenotyping.decode_phenotype(data, phenotype_code, **kwargs)[source]

Convert a phenotype code into a dictionary mapping feature (marker) names to values for that marker’s classification as ‘+’ or ‘-‘.

Parameters:
  • data (pandas.DataFrame) – The DataFrame containing the columns that will be used to decode the phenotype.

  • phenotype_code (str) – The phenotype code string, which should end with ‘+’ or ‘-‘.

  • **kwargs (keyword arguments) –

    Optional keyword arguments to specify prefix and suffix to be added to the column names. - prefix : str, optional

    Prefix to be added to the column names for the feature classification. Default is ‘’.

    • suffixstr, optional

      Suffix to be added to the column names for the feature classification. Default is ‘’.

Returns:

A dictionary where the keys are column names and the values are the corresponding phenotype classification.

Return type:

dict

Raises:

ValueError – If the phenotype code does not end with ‘+’ or ‘-’ or if any columns specified in the phenotype code do not exist in the DataFrame.

Notes

The function splits the phenotype code on ‘+’ and ‘-’ characters to determine the phenotype columns and values. It checks if the columns exist in the DataFrame and whether they are binary or string types to properly map values.

spac.phenotyping.generate_phenotypes_dict(data_df, phenotypes_df, prefix='', suffix='')[source]

Generate a dictionary of phenotype names to their corresponding decoding rules.

Parameters:
  • data_df (pandas.DataFrame) – The DataFrame containing the columns that will be used to decode the phenotypes.

  • phenotypes_df (pandas.DataFrame) –

    A DataFrame containing phenotype definitions with columns: - “phenotype_name” : str

    The name of the phenotype.

    • ”phenotype_code”str

      The code used to decode the phenotype.

  • prefix (str, optional) – Prefix to be added to the column names. Default is ‘’.

  • suffix (str, optional) – Suffix to be added to the column names. Default is ‘’.

Returns:

A dictionary where the keys are phenotype names and the values are dictionaries mapping column names to values.

Return type:

dict

Notes

The function iterates over each row in the phenotypes_df DataFrame and decodes the phenotype using the decode_phenotype function.

spac.phenotyping.is_binary_0_1(column)[source]

Check if a pandas Series contains only binary values (0 and 1).

Parameters:

column (pandas.Series) – The pandas Series to check.

Returns:

True if the Series contains only 0 and 1, False otherwise.

Return type:

bool

Notes

The function considers a Series to be binary if it contains exactly the values 0 and 1, and no other values.

Functions