spac.phenotyping module
- spac.phenotyping.apply_phenotypes(data_df, phenotypes_dic)[source]
Add binary columns to the DataFrame indicating if each cell matches a phenotype.
- Parameters:
data_df (pandas.DataFrame) – The DataFrame to which binary phenotype columns will be added.
phenotypes_dic (dict) – A dictionary where the keys are phenotype names and the values are dictionaries mapping column names to values.
- Returns:
A dictionary where the keys are phenotype names and the values are the counts of rows that match each phenotype.
- Return type:
dict
Notes
The function creates binary columns in the DataFrame for each phenotype and counts the number of rows matching each phenotype.
- spac.phenotyping.assign_manual_phenotypes(data_df, phenotypes_df, annotation='manual_phenotype', prefix='', suffix='', multiple=True, drop_binary_code=True)[source]
Assign manual phenotypes to the DataFrame and generate summaries.
- Parameters:
data_df (pandas.DataFrame) – The DataFrame to which manual phenotypes will be assigned.
phenotypes_df (pandas.DataFrame) –
A DataFrame containing phenotype definitions with columns: - “phenotype_name” : str
The name of the phenotype.
- ”phenotype_code”str
The code used to decode the phenotype.
annotation (str, optional) – The name of the column to store the combined phenotype. Default is “manual_phenotype”.
prefix (str, optional) – Prefix to be added to the column names. Default is ‘’.
suffix (str, optional) – Suffix to be added to the column names. Default is ‘’.
multiple (bool, optional) – Whether to concatenate the names of multiple positive phenotypes. Default is True.
drop_binary_code (bool, optional) – Whether to drop the binary phenotype columns. Default is True.
- Returns:
A dictionary with the following keys: - “phenotypes_counts”: dict
Counts of cells matching each defined phenotype.
- ”assigned_phenotype_counts”: dict
Counts of cells matching different numbers of phenotypes.
- ”multiple_phenotypes_summary”: pandas.DataFrame
Summary of cells with multiple phenotypes.
- Return type:
dict
Notes
The function generates a combined phenotype column, prints summaries of cells matching multiple phenotypes, and returns a dictionary with detailed counts and summaries.
Examples
Suppose data_df is a DataFrame with binary phenotype columns and phenotypes_df contains the following definitions:
>>> data_df = pd.DataFrame({ ... 'cd4_phenotype': [0, 1, 0, 1], ... 'cd8_phenotype': [0, 0, 1, 1] ... }) >>> phenotypes_df = pd.DataFrame([ ... {"phenotype_name": "cd4_cells", "phenotype_code": "cd4+"}, ... {"phenotype_name": "cd8_cells", "phenotype_code": "cd8+"}, ... {"phenotype_name": "cd4_cd8", "phenotype_code": "cd4+cd8+"} ... ]) >>> result = assign_manual_phenotypes( ... data_df, ... phenotypes_df, ... annotation="manual", ... prefix='', ... suffix='_phenotype', ... multiple=True ... )
The data_df DataFrame will be edited in place to include a new column “manual” with the combined phenotype labels:
>>> print(data_df) cd4_phenotype cd8_phenotype manual 0 0 0 no_label 1 1 0 cd4_cells 2 0 1 cd8_cells 3 1 1 cd8_cells, cd4_cd8
The result dictionary contains counts and summaries as follows:
>>> print(result["phenotypes_counts"]) {'cd4_cells': 1, 'cd8_cells': 2, 'cd4_cd8': 1}
>>> print(result["assigned_phenotype_counts"]) 0 1 1 2 2 1 Name: num_phenotypes, dtype: int64
>>> print(result["multiple_phenotypes_summary"]) manual count 0 cd8_cells, cd4_cd8 1
- spac.phenotyping.combine_phenotypes(data_df, phenotype_columns, multiple=True)[source]
Combine multiple binary phenotype columns into a new column in a vectorized manner.
- Parameters:
data_df (pandas.DataFrame) – DataFrame containing the phenotype columns.
phenotype_columns (list of str) – List of binary phenotype column names.
multiple (bool, optional) – Whether to concatenate the names of multiple positive phenotypes. If False, all multiple positive phenotypes are labeled as “no_label”. Default is True.
- Returns:
A Series representing the combined phenotype for each row.
- Return type:
pandas.Series
- spac.phenotyping.decode_phenotype(data, phenotype_code, **kwargs)[source]
Convert a phenotype code into a dictionary mapping feature (marker) names to values for that marker’s classification as ‘+’ or ‘-‘.
- Parameters:
data (pandas.DataFrame) – The DataFrame containing the columns that will be used to decode the phenotype.
phenotype_code (str) – The phenotype code string, which should end with ‘+’ or ‘-‘.
**kwargs (keyword arguments) –
Optional keyword arguments to specify prefix and suffix to be added to the column names. - prefix : str, optional
Prefix to be added to the column names for the feature classification. Default is ‘’.
- suffixstr, optional
Suffix to be added to the column names for the feature classification. Default is ‘’.
- Returns:
A dictionary where the keys are column names and the values are the corresponding phenotype classification.
- Return type:
dict
- Raises:
ValueError – If the phenotype code does not end with ‘+’ or ‘-’ or if any columns specified in the phenotype code do not exist in the DataFrame.
Notes
The function splits the phenotype code on ‘+’ and ‘-’ characters to determine the phenotype columns and values. It checks if the columns exist in the DataFrame and whether they are binary or string types to properly map values.
- spac.phenotyping.generate_phenotypes_dict(data_df, phenotypes_df, prefix='', suffix='')[source]
Generate a dictionary of phenotype names to their corresponding decoding rules.
- Parameters:
data_df (pandas.DataFrame) – The DataFrame containing the columns that will be used to decode the phenotypes.
phenotypes_df (pandas.DataFrame) –
A DataFrame containing phenotype definitions with columns: - “phenotype_name” : str
The name of the phenotype.
- ”phenotype_code”str
The code used to decode the phenotype.
prefix (str, optional) – Prefix to be added to the column names. Default is ‘’.
suffix (str, optional) – Suffix to be added to the column names. Default is ‘’.
- Returns:
A dictionary where the keys are phenotype names and the values are dictionaries mapping column names to values.
- Return type:
dict
Notes
The function iterates over each row in the phenotypes_df DataFrame and decodes the phenotype using the decode_phenotype function.
- spac.phenotyping.is_binary_0_1(column)[source]
Check if a pandas Series contains only binary values (0 and 1).
- Parameters:
column (pandas.Series) – The pandas Series to check.
- Returns:
True if the Series contains only 0 and 1, False otherwise.
- Return type:
bool
Notes
The function considers a Series to be binary if it contains exactly the values 0 and 1, and no other values.