summarize_dataframe
- spac.data_utils.summarize_dataframe(df: DataFrame, columns, print_nan_locations: bool = False) dict [source]
Summarize specified columns in a DataFrame.
For numeric columns, computes summary statistics. For categorical columns, returns unique labels and frequencies. In both cases, missing values (None/NaN) are flagged and their row indices identified.
- Parameters:
df (pd.DataFrame) – The DataFrame to summarize.
columns (str or list of str) – The column name or list of column names to analyze.
print_nan_locations (bool, optional) – If True, prints the row indices where None/NaN values occur. Default is False.
- Returns:
A dictionary where each key is a column name and its value is another dictionary with:
’data_type’: either ‘numeric’ or ‘categorical’
’missing_count’: int
’missing_indices’: list of row indices with missing values
’summary’: summary statistics if numeric or unique labels with
counts if categorical
- Return type:
dict