summarize_dataframe

spac.data_utils.summarize_dataframe(df: DataFrame, columns, print_nan_locations: bool = False) → dict[source]

Summarize specified columns in a DataFrame.

For numeric columns, computes summary statistics. For categorical columns, returns unique labels and frequencies. In both cases, missing values (None/NaN) are flagged and their row indices identified.

Parameters:

df (pd.DataFrame) – The DataFrame to summarize.
columns (str or list of str) – The column name or list of column names to analyze.
print_nan_locations (bool, optional) – If True, prints the row indices where None/NaN values occur. Default is False.

Returns:

A dictionary where each key is a column name and its value is another dictionary with:

’data_type’: either ‘numeric’ or ‘categorical’

’missing_count’: int

’missing_indices’: list of row indices with missing values

’summary’: summary statistics if numeric or unique labels with

counts if categorical

Return type:

dict