API Reference

This page provides the complete API documentation for PySODMetrics.

Core Metrics Module 

class py_sod_metrics.sod_metrics.Fmeasure(beta: float = 0.3)[source]

Bases: object

F-measure evaluator for salient object detection.

Computes precision, recall, and F-measure at multiple thresholds, supporting both adaptive and dynamic evaluation modes.

``` @inproceedings{Fmeasure,

title={Frequency-tuned salient region detection}, author={Achanta, Radhakrishna and Hemami, Sheila and Estrada, Francisco and S{"u}sstrunk, Sabine}, booktitle=CVPR, number={CONF}, pages={1597–1604}, year={2009}

}

__init__(beta: float = 0.3)[source]

Initialize the F-measure evaluator.

Parameters:: beta (float) – the weight of the precision

step(pred: ndarray, gt: ndarray, normalize: bool = True)[source]

Statistics the metric for the pair of pred and gt.

Parameters:

pred (np.ndarray) – Prediction, gray scale image.
gt (np.ndarray) – Ground truth, gray scale image.
normalize (bool, optional) – Whether to normalize the input data. Defaults to True.

cal_adaptive_fm(pred: ndarray, gt: ndarray) → float[source]

Calculate the adaptive F-measure.

Returns:: adaptive_fm
Return type:: float

cal_pr(pred: ndarray, gt: ndarray) → tuple[source]

Calculate the corresponding precision and recall when the threshold changes from 0 to 255.

These precisions and recalls can be used to obtain the mean F-measure, maximum F-measure, precision-recall curve and F-measure-threshold curve.

For convenience, changeable_fms is provided here, which can be used directly to obtain the mean F-measure, maximum F-measure and F-measure-threshold curve.

Returns:: (precisions, recalls, changeable_fms)
Return type:: tuple

get_results() → dict[source]

Return the results about F-measure.

Returns:: dict(fm=dict(adp=adaptive_fm, curve=changeable_fm), pr=dict(p=precision, r=recall))

class py_sod_metrics.sod_metrics.MAE[source]

Bases: object

Mean Absolute Error.

Computes the MAE between predicted saliency maps and ground truth masks.

``` @inproceedings{MAE,

title={Saliency filters: Contrast based filtering for salient region detection}, author={Perazzi, Federico and Kr{"a}henb{"u}hl, Philipp and Pritch, Yael and Hornung, Alexander}, booktitle=CVPR, pages={733–740}, year={2012}

}

__init__()[source]: Initialize the MAE evaluator.

step(pred: ndarray, gt: ndarray, normalize: bool = True)[source]

Statistics the metric for the pair of pred and gt.

Parameters:

pred (np.ndarray) – Prediction, gray scale image.
gt (np.ndarray) – Ground truth, gray scale image.
normalize (bool, optional) – Whether to normalize the input data. Defaults to True.

cal_mae(pred: ndarray, gt: ndarray) → ndarray[source]

Calculate the mean absolute error.

Returns:: mae
Return type:: np.ndarray

get_results() → dict[source]

Return the results about MAE.

Returns:: dict(mae=mae)

class py_sod_metrics.sod_metrics.Smeasure(alpha: float = 0.5)[source]

Bases: object

S-measure evaluates foreground maps by considering both object-aware and region-aware structural similarity between prediction and ground truth. It combines object-level and region-level scores to provide a comprehensive assessment of structural quality.

``` @inproceedings{Smeasure,

title={Structure-measure: A new way to eval foreground maps}, author={Fan, Deng-Ping and Cheng, Ming-Ming and Liu, Yun and Li, Tao and Borji, Ali}, booktitle=ICCV, pages={4548–4557}, year={2017}

}

__init__(alpha: float = 0.5)[source]

Initialize S-measure (Structure-measure) evaluator.

Parameters:: alpha (float, optional) – Weight for balancing the object score and the region score. Higher values give more weight to object-level similarity. Valid range: [0, 1]. Defaults to 0.5 for equal weighting.

step(pred: ndarray, gt: ndarray, normalize: bool = True)[source]

Statistics the metric for the pair of pred and gt.

Parameters:

pred (np.ndarray) – Prediction, gray scale image.
gt (np.ndarray) – Ground truth, gray scale image.
normalize (bool, optional) – Whether to normalize the input data. Defaults to True.

cal_sm(pred: ndarray, gt: ndarray) → float[source]

Calculate the S-measure (Structure-measure) score.

Computes a weighted combination of object-aware and region-aware structural similarity scores. For edge cases (all foreground or all background), returns simplified metrics.

Parameters:

pred (np.ndarray) – Normalized prediction map with values in [0, 1].
gt (np.ndarray) – Binary ground truth mask.

Returns:

S-measure score in range [0, 1], where higher is better.

Return type:

float

s_object(x: ndarray) → float[source]

Calculate object-aware score for a region.

Computes a similarity score that considers both mean and standard deviation of the input region.

Parameters:: x (np.ndarray) – Input region data.
Returns:: Object-aware similarity score.
Return type:: float

object(pred: ndarray, gt: ndarray) → float[source]

Calculate the object-level structural similarity score.

Evaluates structural similarity separately for foreground and background regions, then combines them using the ratio of foreground pixels.

Parameters:

pred (np.ndarray) – Normalized prediction map with values in [0, 1].
gt (np.ndarray) – Binary ground truth mask.

Returns:

Object-level similarity score.

Return type:

float

region(pred: ndarray, gt: ndarray) → float[source]

Calculate the region-level structural similarity score.

Divides the image into four quadrants based on the foreground centroid, then calculates SSIM for each quadrant weighted by its area.

Parameters:

pred (np.ndarray) – Normalized prediction map with values in [0, 1].
gt (np.ndarray) – Binary ground truth mask.

Returns:

Region-level similarity score.

Return type:

float

ssim(pred: ndarray, gt: ndarray) → float[source]

Calculate the SSIM (Structural Similarity Index) score.

Computes structural similarity based on luminance, contrast, and structure comparisons between prediction and ground truth regions.

Parameters:

pred (np.ndarray) – Prediction region.
gt (np.ndarray) – Ground truth region.

Returns:

SSIM score in range [0, 1].

Return type:

float

get_results() → dict[source]

Return the results about S-measure.

Returns:: dict(sm=sm)

class py_sod_metrics.sod_metrics.Emeasure[source]

Bases: object

E-measure assesses binary foreground map quality by measuring the alignment between prediction and ground truth using an enhanced alignment matrix. It addresses limitations of traditional metrics by considering spatial alignment and local/global pixel matching.

``` @inproceedings{Emeasure,

title=”Enhanced-alignment Measure for Binary Foreground Map Evaluation”, author=”Deng-Ping {Fan} and Cheng {Gong} and Yang {Cao} and Bo {Ren} and Ming-Ming {Cheng} and Ali {Borji}”, booktitle=IJCAI, pages=”698–704”, year={2018}

}

Note

More implementation details: https://www.yuque.com/lart/blog/lwgt38

__init__()[source]: Initialize E-measure (Enhanced-alignment Measure) evaluator.

step(pred: ndarray, gt: ndarray, normalize: bool = True)[source]

Statistics the metric for the pair of pred and gt.

Parameters:

pred (np.ndarray) – Prediction, gray scale image.
gt (np.ndarray) – Ground truth, gray scale image.
normalize (bool, optional) – Whether to normalize the input data. Defaults to True.

cal_adaptive_em(pred: ndarray, gt: ndarray) → float[source]

Calculate the adaptive E-measure using an adaptive threshold.

Uses twice the mean prediction value as the adaptive threshold to binarize the prediction before computing E-measure.

Parameters:

pred (np.ndarray) – Normalized prediction map with values in [0, 1].
gt (np.ndarray) – Binary ground truth mask.

Returns:

Adaptive E-measure score.

Return type:

float

cal_changeable_em(pred: ndarray, gt: ndarray) → ndarray[source]

Calculate E-measure scores across all thresholds from 0 to 255.

Computes the E-measure for 257 different thresholds, enabling analysis of maximum E-measure, mean E-measure, and E-measure-threshold curves.

Parameters:

pred (np.ndarray) – Normalized prediction map with values in [0, 1].
gt (np.ndarray) – Binary ground truth mask.

Returns:

Array of 257 E-measure scores corresponding to thresholds [0, 255].

Return type:

np.ndarray

cal_em_with_threshold(pred: ndarray, gt: ndarray, threshold: float) → float[source]

Calculate the E-measure for a specific binarization threshold.

Computes enhanced alignment based on four regions: true positives, false positives, false negatives, and true negatives.

Parameters:

pred (np.ndarray) – Normalized prediction map with values in [0, 1].
gt (np.ndarray) – Binary ground truth mask.
threshold (float) – Binarization threshold value.

Returns:

E-measure score for the given threshold.

Return type:

float

Note

Variable naming convention: [pred_attr(fg/bg)]_[gt_attr(fg/bg)]_[meaning] ‘_’ indicates don’t-care attribute.

cal_em_with_cumsumhistogram(pred: ndarray, gt: ndarray) → ndarray[source]

Calculate the E-measure corresponding to the threshold that varies from 0 to 255..

Variable naming rules within the function: [pred attribute(foreground fg, background bg)]_[gt attribute(foreground fg, background bg)]_[meaning]

If only pred or gt is considered, another corresponding attribute location is replaced with ‘_’.

generate_parts_numel_combinations(fg_fg_numel, fg_bg_numel, pred_fg_numel, pred_bg_numel)[source]

Generate the number of elements in each part of the image.

Parameters:

fg_fg_numel (int) – Number of foreground pixels in the foreground region.
fg_bg_numel (int) – Number of foreground pixels in the background region.
pred_fg_numel (int) – Number of foreground pixels in the predicted region.
pred_bg_numel (int) – Number of background pixels in the predicted region.

Returns:

A tuple containing the number of elements in each part of the image.

Return type:

tuple

get_results() → dict[source]

Return the results about E-measure.

Returns:: dict(em=dict(adp=adaptive_em, curve=changeable_em))

class py_sod_metrics.sod_metrics.WeightedFmeasure(beta: float = 1)[source]

Bases: object

Weighted F-measure considers both pixel dependency and pixel importance when evaluating foreground maps. It weights different pixels according to their distance from the foreground boundary to provide a more perceptually meaningful assessment than standard F-measure.

``` @inproceedings{wFmeasure,

title={How to eval foreground maps?}, author={Margolin, Ran and Zelnik-Manor, Lihi and Tal, Ayellet}, booktitle=CVPR, pages={248–255}, year={2014}

}

__init__(beta: float = 1)[source]

Initialize Weighted F-measure evaluator.

Parameters:: beta (float, optional) – Weight for balancing precision and recall. Defaults to 1 for equal weighting (F1-score).

step(pred: ndarray, gt: ndarray, normalize: bool = True)[source]

Statistics the metric for the pair of pred and gt.

Parameters:

pred (np.ndarray) – Prediction, gray scale image.
gt (np.ndarray) – Ground truth, gray scale image.
normalize (bool, optional) – Whether to normalize the input data. Defaults to True.

cal_wfm(pred: ndarray, gt: ndarray) → float[source]

Calculate the weighted F-measure score.

Implements the weighted F-measure algorithm that considers: 1. Pixel dependency: Uses error at closest GT edge for background pixels 2. Pixel importance: Weights errors by distance from foreground

Parameters:

pred (np.ndarray) – Normalized prediction map with values in [0, 1].
gt (np.ndarray) – Binary ground truth mask.

Returns:

Weighted F-measure score based on weighted precision and recall.

Return type:

float

matlab_style_gauss2D(shape: tuple = (7, 7), sigma: int = 5) → ndarray[source]

Generate a 2D Gaussian kernel compatible with MATLAB’s fspecial.

Creates a normalized 2D Gaussian kernel that matches MATLAB’s fspecial(‘gaussian’, [shape], sigma) output.

Parameters:

shape (tuple, optional) – Kernel size as (height, width). Defaults to (7, 7).
sigma (int, optional) – Standard deviation of the Gaussian. Defaults to 5.

Returns:

Normalized 2D Gaussian kernel.

Return type:

np.ndarray

get_results() → dict[source]

Return the results about weighted F-measure.

Returns:: dict(wfm=weighted_fm)

class py_sod_metrics.sod_metrics.HumanCorrectionEffortMeasure(relax: int = 5, epsilon: float = 2.0)[source]

Bases: object

Human Correction Effort Measure for Dichotomous Image Segmentation.

``` @inproceedings{HumanCorrectionEffortMeasure,

title = {Highly Accurate Dichotomous Image Segmentation}, author = {Xuebin Qin and Hang Dai and Xiaobin Hu and Deng-Ping Fan and Ling Shao and Luc Van Gool}, booktitle = ECCV, year = {2022}

}

__init__(relax: int = 5, epsilon: float = 2.0)[source]

Initialize the Human Correction Effort Measure.

Parameters:

relax (int, optional) – The number of relaxations. Defaults to 5.
epsilon (float, optional) – The epsilon value. Defaults to 2.0.

step(pred: ndarray, gt: ndarray, normalize: bool = True)[source]

Statistics the metric for the pair of pred and gt.

Parameters:

pred (np.ndarray) – Prediction, gray scale image.
gt (np.ndarray) – Ground truth, gray scale image.
normalize (bool, optional) – Whether to normalize the input data. Defaults to True.

cal_hce(pred: ndarray, gt: ndarray) → float[source]

Calculate the Human Correction Effort (HCE) for a pair of prediction and ground truth.

Parameters:

pred (np.ndarray) – Prediction, gray scale image.
gt (np.ndarray) – Ground truth, gray scale image.

Returns:

The HCE value.

Return type:

float

filter_conditional_boundary(contours: list, mask: ndarray, condition: ndarray)[source]

Filter boundary segments based on a given condition mask and compute the number of independent connected regions that require human correction.

Parameters:

contours (List[np.ndarray]) – List of boundary contours (OpenCV format).
mask (np.ndarray) – Binary mask representing the region of interest.
condition (np.ndarray) – Condition mask used to determine which boundary points need to be considered.

Returns:

boundaries (List[np.ndarray]): Filtered boundary segments that require correction.
independent_count (int): Number of independent connected regions

that need correction (i.e., human editing effort).

Return type:

Tuple[List[np.ndarray], int]

count_polygon_control_points(boundaries: list, epsilon: float = 1.0) → int[source]

Approximate each boundary using the Ramer-Douglas-Peucker (RDP) algorithm and count the total number of control points of all approximated polygons.

Parameters:

boundaries (List[np.ndarray]) – List of boundary contours. Each contour is an Nx1x2 numpy array (OpenCV contour format).
epsilon (float) – RDP approximation tolerance. Larger values result in fewer control points.

Returns:

The total number of control points across all approximated polygons.

Return type:

int

Reference:: https://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm

get_results() → dict[source]

Return the results about HCE.

Returns:: dict(hce=hce)

FmeasureV2 Module 

class py_sod_metrics.fmeasurev2.IOUHandler(with_dynamic: bool, with_adaptive: bool, *, with_binary: bool = False, sample_based: bool = True)[source]

Bases: _BaseHandler

Intersection over Union.

iou = tp / (tp + fp + fn)

__call__(tp, fp, tn, fn)[source]: Calculate IoU from confusion matrix components.

class py_sod_metrics.fmeasurev2.SpecificityHandler(with_dynamic: bool, with_adaptive: bool, *, with_binary: bool = False, sample_based: bool = True)[source]

Bases: _BaseHandler

Specificity.

True negative rate (TNR)/specificity (SPC)/selectivity

specificity = tn / (tn + fp)

__call__(tp, fp, tn, fn)[source]: Calculate specificity from confusion matrix components.

py_sod_metrics.fmeasurev2.TNRHandler: alias of SpecificityHandler

class py_sod_metrics.fmeasurev2.DICEHandler(with_dynamic: bool, with_adaptive: bool, *, with_binary: bool = False, sample_based: bool = True)[source]

Bases: _BaseHandler

DICE.

dice = 2 * tp / (tp + fn + tp + fp)

__call__(tp, fp, tn, fn)[source]: Calculate DICE coefficient from confusion matrix components.

class py_sod_metrics.fmeasurev2.OverallAccuracyHandler(with_dynamic: bool, with_adaptive: bool, *, with_binary: bool = False, sample_based: bool = True)[source]

Bases: _BaseHandler

Overall Accuracy.

oa = overall_accuracy = (tp + tn) / (tp + fp + tn + fn)

__call__(tp, fp, tn, fn)[source]: Calculate overall accuracy from confusion matrix components.

class py_sod_metrics.fmeasurev2.KappaHandler(with_dynamic: bool, with_adaptive: bool, *, with_binary: bool = False, sample_based: bool = True)[source]

Bases: _BaseHandler

Kappa Accuracy.

kappa = kappa = (oa - p_) / (1 - p_) p_ = [(tp + fp)(tp + fn) + (tn + fn)(tn + tp)] / (tp + fp + tn + fn)^2

__init__(with_dynamic: bool, with_adaptive: bool, *, with_binary: bool = False, sample_based: bool = True)[source]

Initialize the Kappa handler.

Parameters:

with_dynamic (bool, optional) – Record dynamic results for max/avg/curve versions.
with_adaptive (bool, optional) – Record adaptive results for adp version.
with_binary (bool, optional) – Record binary results for binary version.
sample_based (bool, optional) – Whether to average the metric of each sample or calculate the metric of the dataset. Defaults to True.

__call__(tp, fp, tn, fn)[source]: Calculate Kappa coefficient from confusion matrix components.

class py_sod_metrics.fmeasurev2.PrecisionHandler(with_dynamic: bool, with_adaptive: bool, *, with_binary: bool = False, sample_based: bool = True)[source]

Bases: _BaseHandler

Precision.

precision = tp / (tp + fp)

__call__(tp, fp, tn, fn)[source]: Calculate precision from confusion matrix components.

class py_sod_metrics.fmeasurev2.RecallHandler(with_dynamic: bool, with_adaptive: bool, *, with_binary: bool = False, sample_based: bool = True)[source]

Bases: _BaseHandler

Recall.

True positive rate (TPR)/recall/sensitivity (SEN)/probability of detection/hit rate/power

recall = tp / (tp + fn)

__call__(tp, fp, tn, fn)[source]: Calculate recall from confusion matrix components.

py_sod_metrics.fmeasurev2.TPRHandler: alias of RecallHandler

py_sod_metrics.fmeasurev2.SensitivityHandler: alias of RecallHandler

class py_sod_metrics.fmeasurev2.FPRHandler(with_dynamic: bool, with_adaptive: bool, *, with_binary: bool = False, sample_based: bool = True)[source]

Bases: _BaseHandler

False Positive Rate.

False positive rate (FPR)/probability of false alarm/fall-out

fpr = fp / (tn + fp)

__call__(tp, fp, tn, fn)[source]: Calculate false positive rate from confusion matrix components.

class py_sod_metrics.fmeasurev2.BERHandler(with_dynamic: bool, with_adaptive: bool, *, with_binary: bool = False, sample_based: bool = True)[source]

Bases: _BaseHandler

Balance Error Rate.

ber = 1 - 0.5 * (tp / (tp + fn) + tn / (tn + fp))

__call__(tp, fp, tn, fn)[source]: Calculate balanced error rate from confusion matrix components.

class py_sod_metrics.fmeasurev2.FmeasureHandler(with_dynamic: bool, with_adaptive: bool, *, with_binary: bool = False, sample_based: bool = True, beta: float = 0.3)[source]

Bases: _BaseHandler

F-measure.

fmeasure = (beta + 1) * precision * recall / (beta * precision + recall)

__init__(with_dynamic: bool, with_adaptive: bool, *, with_binary: bool = False, sample_based: bool = True, beta: float = 0.3)[source]

Initialize the F-measure handler.

Parameters:

with_dynamic (bool, optional) – Record dynamic results for max/avg/curve versions.
with_adaptive (bool, optional) – Record adaptive results for adp version.
with_binary (bool, optional) – Record binary results for binary version.
sample_based (bool, optional) – Whether to average the metric of each sample or calculate the metric of the dataset. Defaults to True.
beta (bool, optional) – β^2 in F-measure. Defaults to 0.3.

__call__(tp, fp, tn, fn)[source]: Calculate F-measure from confusion matrix components.

Note

Uses separate precision and recall calculations to maintain consistency with original implementation rather than combined formula.

class py_sod_metrics.fmeasurev2.FmeasureV2(metric_handlers: dict | None = None)[source]

Bases: object

Enhanced F-measure evaluator with support for multiple evaluation metrics.

This class provides a flexible framework for computing various binary classification metrics including precision, recall, specificity, dice, IoU, and F-measure. It supports dynamic thresholding, adaptive thresholding, and binary evaluation modes.

__init__(metric_handlers: dict | None = None)[source]

Enhanced Fmeasure class with more relevant metrics, e.g. precision, recall, specificity, dice, iou, fmeasure and so on.

Parameters:: metric_handlers (dict, optional) – Handlers of different metrics. Defaults to None.

add_handler(handler_name, metric_handler)[source]

Add a metric handler to the evaluator.

Parameters:

handler_name (str) – Name identifier for the metric handler.
metric_handler – Handler instance that computes the specific metric.

static get_statistics(binary: ndarray, gt: ndarray, FG: int, BG: int) → dict[source]

Calculate the TP, FP, TN and FN based a adaptive threshold.

Parameters:

binary (np.ndarray) – binarized pred containing [0, 1]
gt (np.ndarray) – gt binarized by 128
FG (int) – the number of foreground pixels in gt
BG (int) – the number of background pixels in gt

Returns:

TP, FP, TN, FN

Return type:

dict

adaptively_binarizing(pred: ndarray, gt: ndarray, FG: int, BG: int) → dict[source]

Calculate the TP, FP, TN and FN based a adaptive threshold.

Parameters:

pred (np.ndarray) – prediction normalized in [0, 1]
gt (np.ndarray) – gt binarized by 128
FG (int) – the number of foreground pixels in gt
BG (int) – the number of background pixels in gt

Returns:

TP, FP, TN, FN

Return type:

dict

dynamically_binarizing(pred: ndarray, gt: ndarray, FG: int, BG: int) → dict[source]

Calculate the corresponding TP, FP, TN and FNs when the threshold changes from 0 to 255.

Parameters:

pred (np.ndarray) – prediction normalized in [0, 1]
gt (np.ndarray) – gt binarized by 128
FG (int) – the number of foreground pixels in gt
BG (int) – the number of background pixels in gt

Returns:

TPs, FPs, TNs, FNs

Return type:

dict

step(pred: ndarray, gt: ndarray, normalize: bool = True)[source]

Statistics the metrics for the pair of pred and gt.

Parameters:

pred (np.ndarray) – Prediction, gray scale image.
gt (np.ndarray) – Ground truth, gray scale image.
normalize (bool, optional) – Whether to normalize the input data. Defaults to True.

get_results() → dict[source]

Return the results of the specific metric names.

Returns:: All results corresponding to different metrics.
Return type:: dict

Context Measure Module 

class py_sod_metrics.context_measure.ContextMeasure(beta2: float = 1.0, alpha: float = 6.0)[source]

Bases: object

Context-measure for evaluating foreground segmentation quality.

This metric evaluates predictions by considering both forward inference (how well predictions align with ground truth) and reverse deduction (how completely ground truth is covered by predictions), using context-aware Gaussian kernels.

``` @article{ContextMeasure,

title={Context-measure: Contextualizing Metric for Camouflage}, author={Wang, Chen-Yang and Ji, Gepeng and Shao, Song and Cheng, Ming-Ming and Fan, Deng-Ping}, journal={arXiv preprint arXiv:2512.07076}, year={2025}

}

__init__(beta2: float = 1.0, alpha: float = 6.0)[source]

Initialize the Context Measure evaluator.

Parameters:

beta2 (float) – Balancing factor between forward inference and reverse deduction. Higher values give more weight to forward inference. Defaults to 1.0.
alpha (float) – Scaling factor for Gaussian kernel covariance, controls the spatial context range. Defaults to 6.0.

step(pred: ndarray, gt: ndarray, normalize: bool = True)[source]

Statistics the metric for the pair of pred and gt.

Parameters:

pred (np.ndarray) – Prediction, gray scale image.
gt (np.ndarray) – Ground truth, gray scale image.
normalize (bool, optional) – Whether to normalize the input data. Defaults to True.

compute(pred: ndarray, gt: ndarray, cd: ndarray) → float[source]

Compute the context measure between prediction and ground truth.

Parameters:

pred (np.ndarray) – Prediction map (values between 0 and 1).
gt (np.ndarray) – Ground truth map (boolean or 0/1 values).
cd (np.ndarray) – Camouflage degree map (values between 0 and 1).

Returns:

Context measure value.

Return type:

float

get_results() → dict[source]

Return the results about context measure.

Returns:: dict(cm=context_measure)

class py_sod_metrics.context_measure.CamouflageContextMeasure(beta2: float = 1.2, alpha: float = 6.0, gamma: int = 8, lambda_spatial: float = 20)[source]

Bases: ContextMeasure

Camouflage Context-measure for evaluating camouflaged object detection quality.

This metric extends the base ContextMeasure by incorporating camouflage degree, which measures how well the foreground blends with its surrounding background. It uses patch-based nearest neighbor matching in Lab color space with spatial constraints to estimate camouflage difficulty.

``` @article{ContextMeasure,

title={Context-measure: Contextualizing Metric for Camouflage}, author={Wang, Chen-Yang and Ji, Gepeng and Shao, Song and Cheng, Ming-Ming and Fan, Deng-Ping}, journal={arXiv preprint arXiv:2512.07076}, year={2025}

}

__init__(beta2: float = 1.2, alpha: float = 6.0, gamma: int = 8, lambda_spatial: float = 20)[source]

Initialize the Camouflage Context Measure evaluator.

Parameters:

beta2 (float) – Balancing factor for forward and reverse. Defaults to 1.2 for camouflage.
alpha (float) – Gaussian kernel scaling factor. Defaults to 6.0.
gamma (int) – Exponential scaling factor for camouflage degree. Defaults to 8.
lambda_spatial (float) – Weight for spatial distance in ANN search. Defaults to 20.

step(pred: ndarray, gt: ndarray, img: ndarray, normalize: bool = True)[source]

Statistics the metric for the pair of pred, gt, and img.

Parameters:

pred (np.ndarray) – Prediction, gray scale image.
gt (np.ndarray) – Ground truth, gray scale image.
img (np.ndarray) – Original RGB image (required for camouflage degree calculation).
normalize (bool, optional) – Whether to normalize the input data. Defaults to True.

get_results() → dict[source]

Return the results about camouflage context measure.

Returns:: dict(ccm=camouflage_context_measure)

Multi-Scale IoU Module 

class py_sod_metrics.multiscale_iou.MSIoU(with_dynamic: bool, with_adaptive: bool, *, with_binary: bool = False, num_levels=10)[source]

Bases: object

Multi-Scale Intersection over Union (MSIoU) metric.

This implements the MSIoU metric which evaluates segmentation quality at multiple scales by comparing edge maps. It addresses the limitation of traditional IoU which struggles with fine structures in segmentation results.

``` @inproceedings{MSIoU,

title = {Multiscale IOU: A Metric for Evaluation of Salient Object Detection with Fine Structures}, author = {Ahmadzadeh, Azim and Kempton, Dustin J. and Chen, Yang and Angryk, Rafal A.}, booktitle = ICIP, year = {2021},

}

__init__(with_dynamic: bool, with_adaptive: bool, *, with_binary: bool = False, num_levels=10)[source]

Initialize the MSIoU evaluator.

Parameters:

with_dynamic (bool, optional) – Record dynamic results for max/avg/curve versions.
with_adaptive (bool, optional) – Record adaptive results for adp version.
with_binary (bool, optional) – Record binary results for binary version.

get_edge(mask: ndarray)[source]

Edge detection based on the scipy.ndimage.sobel function.

Parameters:: mask – a binary mask of an object whose edges are of interest.
Returns:: a binary mask of 1’s as edges and 0’s as background.

shrink_by_grid(image: ndarray, cell_size: int) → ndarray[source]

Shrink the image by summing values within grid cells.

Performs box-counting after applying zero padding if the image dimensions are not perfectly divisible by the cell size.

Parameters:

image – The input binary image (edges).
cell_size – The size of the grid cells.

Returns:

A shrunk binary image where each pixel represents a grid cell.

multi_scale_iou(pred_edge: ndarray, gt_edge: ndarray) → list[source]

Calculate Multi-Scale IoU.

Parameters:

pred_edge (np.ndarray) – edge map of pred
gt_edge (np.ndarray) – edge map of gt

Returns:

ratios

Return type:

list

binarizing(pred_bin: ndarray, gt_edge: ndarray) → list[source]

Calculate Multi-Scale IoU based on dynamically thresholding.

Parameters:

pred_bin (np.ndarray) – binarized pred
gt_edge (np.ndarray) – gt binarized by 128

Returns:

areas under the curve

Return type:

np.ndarray

step(pred: ndarray, gt: ndarray, normalize: bool = True)[source]

Calculate the Multi-Scale IoU for a single prediction-ground truth pair.

This method first extracts edges from both prediction and ground truth, then computes IoU ratios at multiple scales defined by self.cell_sizes. Finally, it calculates the area under the curve of these ratios.

Parameters:

pred (np.ndarray) – Prediction, gray scale image.
gt (np.ndarray) – Ground truth, gray scale image.
normalize (bool, optional) – Whether to normalize the input data. Defaults to True.

Returns:

The MSIoU score for the given pair (float between 0 and 1).

get_results() → dict[source]

Return the results about MSIoU.

Calculates the mean of all stored MSIoU values from previous calls to step().

Returns:: Dictionary with key ‘msiou’ and the mean MSIoU value.
Raises:: ValueError if no samples have been processed.

Size Invariance Module 

py_sod_metrics.size_invariance.parse_connected_components(mask: ndarray, area_threshold: float = 50) → tuple[source]

Find the connected components in a binary mask.

If there are no connected components, return an empty list.
If all the connected components are smaller than the area_threshold, we will return the largest one.

Parameters:

mask (np.ndarray) – binary mask
area_threshold (float) – The threshold for the area of the connected components.

Returns:

max_valid_tgt_idx, valid_labeled_mask

Return type:

tuple

py_sod_metrics.size_invariance.encode_bboxwise_tgts_bitwise(max_valid_tgt_idx: int, valid_labeled_mask: ndarray) → ndarray[source]

Encode each target bbox region with a bitwise mask.

Parameters:

max_valid_tgt_idx (int) – The maximum index of the valid targets.
valid_labeled_mask (np.ndarray) – The mask of the valid targets. 0 is background.

Returns:

The size weight for the bbox of each target.

Return type:

np.ndarray

py_sod_metrics.size_invariance.get_kth_bit(n: ndarray, k: int) → ndarray[source]

Get the value (0 or 1) in the k-th bit of each element in the array.

Parameters:

n (np.ndarray) – The original data array.
k (int) – The index of the bit to extract.

Returns:

The extracted data array. Only the output of the kth bit which is not 0 equals 1.

Return type:

np.ndarray

class py_sod_metrics.size_invariance.SizeInvarianceFmeasureV2(metric_handlers: dict | None = None)[source]

Bases: FmeasureV2

Size invariance version of FmeasureV2.

This provides size-invariant versions of standard SOD metrics that address the imbalance problem in multi-object salient object detection. Traditional metrics can be biased toward larger objects, while size-invariant metrics ensure fair evaluation across objects of different sizes.

``` @inproceedings{SizeInvarianceVariants,

title = {Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection}, author = {Feiran Li and Qianqian Xu and Shilong Bao and Zhiyong Yang and Runmin Cong and Xiaochun Cao and Qingming Huang}, booktitle = ICML, year = {2024}

}

step(pred: ndarray, gt: ndarray, normalize: bool = True)[source]

Statistics the metrics for the pair of pred and gt.

Parameters:

pred (np.ndarray) – Prediction, gray scale image.
gt (np.ndarray) – Ground truth, gray scale image.
normalize (bool, optional) – Whether to normalize the input data. Defaults to True.

get_results() → dict[source]

Return the results of the specific metric names.

Returns:: All results corresponding to different metrics.
Return type:: dict

class py_sod_metrics.size_invariance.SizeInvarianceMAE[source]

Bases: MAE

Size invariance version of MAE.

``` @inproceedings{SizeInvarianceVariants,

title = {Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection}, author = {Feiran Li and Qianqian Xu and Shilong Bao and Zhiyong Yang and Runmin Cong and Xiaochun Cao and Qingming Huang}, booktitle = ICML, year = {2024}

}

step(pred: ndarray, gt: ndarray, normalize: bool = True)[source]

Statistics the metric for the pair of pred and gt.

Parameters:

pred (np.ndarray) – Prediction, gray scale image.
gt (np.ndarray) – Ground truth, gray scale image.
normalize (bool, optional) – Whether to normalize the input data. Defaults to True.

get_results() → dict[source]

Return the results about MAE.

Returns:: dict(mae=mae)

Utility Functions 

py_sod_metrics.utils.validate_and_normalize_input(pred: ndarray, gt: ndarray, normalize: bool = True)[source]

Validate and optionally normalize prediction and ground truth inputs.

This function ensures that prediction and ground truth arrays have compatible shapes and appropriate data types. When normalization is enabled, it converts inputs to the standard format required by the predefined metrics (pred in [0, 1] as float, gt as boolean).

Parameters:

pred (np.ndarray) – Prediction array. If normalize=True, should be uint8 grayscale image (0-255). If normalize=False, should be float32/float64 in range [0, 1].
gt (np.ndarray) – Ground truth array. If normalize=True, should be uint8 grayscale image (0-255). If normalize=False, should be boolean array.
normalize (bool, optional) – Whether to normalize the input data using prepare_data(). Defaults to True.

Returns:

A tuple containing:

pred (np.ndarray): Normalized prediction as float64 in range [0, 1].
gt (np.ndarray): Normalized ground truth as boolean array.

Return type:

tuple

Raises:

ValueError – If prediction and ground truth shapes don’t match, or if prediction values are outside [0, 1] range when normalize=False.
TypeError – If data types are invalid when normalize=False (pred must be float32/float64, gt must be boolean).

py_sod_metrics.utils.prepare_data(pred: ndarray, gt: ndarray) → tuple[source]

Convert and normalize prediction and ground truth data.

For predictions, mimics MATLAB’s mapminmax(im2double(…)).
For ground truth, applies binary thresholding at 128.

Parameters:

pred (np.ndarray) – Prediction grayscale image, uint8 type with values in [0, 255].
gt (np.ndarray) – Ground truth grayscale image, uint8 type with values in [0, 255].

Returns:

A tuple containing:

pred (np.ndarray): Normalized prediction as float64 in range [0, 1].
gt (np.ndarray): Binary ground truth as boolean array.

Return type:

tuple

py_sod_metrics.utils.get_adaptive_threshold(matrix: ndarray, max_value: float = 1) → float[source]

Return an adaptive threshold, which is equal to twice the mean of matrix.

Parameters:

matrix (np.ndarray) – a data array
max_value (float, optional) – the upper limit of the threshold. Defaults to 1.

Returns:

min(2 * matrix.mean(), max_value)

Return type:

float

API Reference

Core Metrics Module

}

}

}

}

}

}

FmeasureV2 Module

Context Measure Module

}

}

Multi-Scale IoU Module

}

Size Invariance Module

}

}

Utility Functions

Core Metrics Module 

FmeasureV2 Module 

Context Measure Module 

Multi-Scale IoU Module 

Size Invariance Module 

Utility Functions 