compute_boxplot_metrics
- spac.utils.compute_boxplot_metrics(data: DataFrame, annotation=None, showfliers: bool | None = None)[source]
Compute boxplot-related statistical metrics for a given dataset efficiently.
- Statistics include:
Lower and upper whiskers (whislo, whishi),
First quartile (q1),
Median (median),
Third quartile (q3),
Mean (mean)
Outliers (fliers) [If showfliers is not None]
It can identify outliers based on the ‘showfliers’ parameter, and supports efficient handling of large datasets by downsampling outliers when specified.
- Parameters:
data (pd.DataFrame) – A pandas DataFrame containing the numerical data for which the boxplot statistics are to be computed.
annotation (str, optional:) – The annotation used to group the features
showfliers ({None, "downsample", "all"}, default = None) – Defines how outliers are handled If ‘all’, all outliers are displayed in the boxplot. If ‘downsample’, when num outliers is >10k, they are downsampled to 10% of the original count. If None, outliers are hidden.
- Returns:
metrics – A dataframe with one row per feature/annotation grouping and columns representing the calculated features
- Return type:
pd.DataFrame