- spac.phenotyping.assign_manual_phenotypes(data_df, phenotypes_df, annotation='manual_phenotype', prefix='', suffix='', multiple=True, drop_binary_code=True)[source]
Assign manual phenotypes to the DataFrame and generate summaries.
- Parameters:
data_df (pandas.DataFrame) – The DataFrame to which manual phenotypes will be assigned.
phenotypes_df (pandas.DataFrame) –
A DataFrame containing phenotype definitions with columns: - “phenotype_name” : str
The name of the phenotype.
- ”phenotype_code”str
The code used to decode the phenotype.
annotation (str, optional) – The name of the column to store the combined phenotype. Default is “manual_phenotype”.
prefix (str, optional) – Prefix to be added to the column names. Default is ‘’.
suffix (str, optional) – Suffix to be added to the column names. Default is ‘’.
multiple (bool, optional) – Whether to concatenate the names of multiple positive phenotypes. Default is True.
drop_binary_code (bool, optional) – Whether to drop the binary phenotype columns. Default is True.
- Returns:
A dictionary with the following keys: - “phenotypes_counts”: dict
Counts of cells matching each defined phenotype.
- ”assigned_phenotype_counts”: dict
Counts of cells matching different numbers of phenotypes.
- ”multiple_phenotypes_summary”: pandas.DataFrame
Summary of cells with multiple phenotypes.
- Return type:
dict
Notes
The function generates a combined phenotype column, prints summaries of cells matching multiple phenotypes, and returns a dictionary with detailed counts and summaries.
Examples
Suppose data_df is a DataFrame with binary phenotype columns and phenotypes_df contains the following definitions:
>>> data_df = pd.DataFrame({ ... 'cd4_phenotype': [0, 1, 0, 1], ... 'cd8_phenotype': [0, 0, 1, 1] ... }) >>> phenotypes_df = pd.DataFrame([ ... {"phenotype_name": "cd4_cells", "phenotype_code": "cd4+"}, ... {"phenotype_name": "cd8_cells", "phenotype_code": "cd8+"}, ... {"phenotype_name": "cd4_cd8", "phenotype_code": "cd4+cd8+"} ... ]) >>> result = assign_manual_phenotypes( ... data_df, ... phenotypes_df, ... annotation="manual", ... prefix='', ... suffix='_phenotype', ... multiple=True ... )
The data_df DataFrame will be edited in place to include a new column “manual” with the combined phenotype labels:
>>> print(data_df) cd4_phenotype cd8_phenotype manual 0 0 0 no_label 1 1 0 cd4_cells 2 0 1 cd8_cells 3 1 1 cd8_cells, cd4_cd8
The result dictionary contains counts and summaries as follows:
>>> print(result["phenotypes_counts"]) {'cd4_cells': 1, 'cd8_cells': 2, 'cd4_cd8': 1}
>>> print(result["assigned_phenotype_counts"]) 0 1 1 2 2 1 Name: num_phenotypes, dtype: int64
>>> print(result["multiple_phenotypes_summary"]) manual count 0 cd8_cells, cd4_cd8 1