summarize_dataframe

spac.data_utils.summarize_dataframe(df: DataFrame, columns, print_nan_locations: bool = False) dict[source]

Summarize specified columns in a DataFrame.

For numeric columns, computes summary statistics. For categorical columns, returns unique labels and frequencies. In both cases, missing values (None/NaN) are flagged and their row indices identified.

Parameters:
  • df (pd.DataFrame) – The DataFrame to summarize.

  • columns (str or list of str) – The column name or list of column names to analyze.

  • print_nan_locations (bool, optional) – If True, prints the row indices where None/NaN values occur. Default is False.

Returns:

A dictionary where each key is a column name and its value is another dictionary with:

  • ’data_type’: either ‘numeric’ or ‘categorical’

  • ’missing_count’: int

  • ’missing_indices’: list of row indices with missing values

  • ’summary’: summary statistics if numeric or unique labels with

counts if categorical

Return type:

dict