spac.phenotyping.assign_manual_phenotypes(data_df, phenotypes_df, annotation='manual_phenotype', prefix='', suffix='', multiple=True, drop_binary_code=True)[source]

Assign manual phenotypes to the DataFrame and generate summaries.

Parameters:
  • data_df (pandas.DataFrame) – The DataFrame to which manual phenotypes will be assigned.

  • phenotypes_df (pandas.DataFrame) –

    A DataFrame containing phenotype definitions with columns: - “phenotype_name” : str

    The name of the phenotype.

    • ”phenotype_code”str

      The code used to decode the phenotype.

  • annotation (str, optional) – The name of the column to store the combined phenotype. Default is “manual_phenotype”.

  • prefix (str, optional) – Prefix to be added to the column names. Default is ‘’.

  • suffix (str, optional) – Suffix to be added to the column names. Default is ‘’.

  • multiple (bool, optional) – Whether to concatenate the names of multiple positive phenotypes. Default is True.

  • drop_binary_code (bool, optional) – Whether to drop the binary phenotype columns. Default is True.

Returns:

A dictionary with the following keys: - “phenotypes_counts”: dict

Counts of cells matching each defined phenotype.

  • ”assigned_phenotype_counts”: dict

    Counts of cells matching different numbers of phenotypes.

  • ”multiple_phenotypes_summary”: pandas.DataFrame

    Summary of cells with multiple phenotypes.

Return type:

dict

Notes

The function generates a combined phenotype column, prints summaries of cells matching multiple phenotypes, and returns a dictionary with detailed counts and summaries.

Examples

Suppose data_df is a DataFrame with binary phenotype columns and phenotypes_df contains the following definitions:

>>> data_df = pd.DataFrame({
...     'cd4_phenotype': [0, 1, 0, 1],
...     'cd8_phenotype': [0, 0, 1, 1]
... })
>>> phenotypes_df = pd.DataFrame([
...     {"phenotype_name": "cd4_cells", "phenotype_code": "cd4+"},
...     {"phenotype_name": "cd8_cells", "phenotype_code": "cd8+"},
...     {"phenotype_name": "cd4_cd8", "phenotype_code": "cd4+cd8+"}
... ])
>>> result = assign_manual_phenotypes(
...     data_df,
...     phenotypes_df,
...     annotation="manual",
...     prefix='',
...     suffix='_phenotype',
...     multiple=True
... )

The data_df DataFrame will be edited in place to include a new column “manual” with the combined phenotype labels:

>>> print(data_df)
   cd4_phenotype  cd8_phenotype manual
0              0              0 no_label
1              1              0 cd4_cells
2              0              1 cd8_cells
3              1              1 cd8_cells, cd4_cd8

The result dictionary contains counts and summaries as follows:

>>> print(result["phenotypes_counts"])
{'cd4_cells': 1, 'cd8_cells': 2, 'cd4_cd8': 1}
>>> print(result["assigned_phenotype_counts"])
0    1
1    2
2    1
Name: num_phenotypes, dtype: int64
>>> print(result["multiple_phenotypes_summary"])
               manual  count
0  cd8_cells, cd4_cd8      1