compute_boxplot_metrics

spac.utils.compute_boxplot_metrics(data: DataFrame, annotation=None, showfliers: bool | None = None)[source]

Compute boxplot-related statistical metrics for a given dataset efficiently.

Statistics include:
  • Lower and upper whiskers (whislo, whishi),

  • First quartile (q1),

  • Median (median),

  • Third quartile (q3),

  • Mean (mean)

  • Outliers (fliers) [If showfliers is not None]

It can identify outliers based on the ‘showfliers’ parameter, and supports efficient handling of large datasets by downsampling outliers when specified.

Parameters:
  • data (pd.DataFrame) – A pandas DataFrame containing the numerical data for which the boxplot statistics are to be computed.

  • annotation (str, optional:) – The annotation used to group the features

  • showfliers ({None, "downsample", "all"}, default = None) – Defines how outliers are handled If ‘all’, all outliers are displayed in the boxplot. If ‘downsample’, when num outliers is >10k, they are downsampled to 10% of the original count. If None, outliers are hidden.

Returns:

metrics – A dataframe with one row per feature/annotation grouping and columns representing the calculated features

Return type:

pd.DataFrame