Supported Metrics
This page provides detailed information about all the metrics supported by PySODMetrics.
Overview
PySODMetrics provides two types of metric computation:
Sample-based: Metrics are computed for each sample individually and then aggregated
Whole-based: Metrics are computed across all samples globally
Most metrics support different aggregation strategies:
max: Maximum value across all thresholdsavg: Average value across all thresholdsadp: Adaptive threshold (2 × mean of predictions)bin: Binary threshold (typically 0.5 or fixed threshold)si-*: Size-invariant variants for handling multi-scale objects
Basic Metrics
MAE (Mean Absolute Error)
Measures the pixel-wise absolute difference between prediction and ground truth.
where \(P\) is the prediction, \(G\) is the ground truth, and \(W \times H\) is the image size.
Reference:
Perazzi et al., “Saliency filters: Contrast based filtering for salient region detection”, CVPR 2012
Usage:
from py_sod_metrics import MAE
mae = MAE()
mae.step(pred, gt)
results = mae.get_results()
print(f"MAE: {results['mae']:.4f}")
S-measure (Structure Measure)
Evaluates structural similarity between prediction and ground truth, considering both region-aware and object-aware components.
where \(S_o\) is the object-aware structural similarity and \(S_r\) is the region-aware structural similarity.
Reference:
Fan et al., “Structure-measure: A new way to evaluate foreground maps”, ICCV 2017
Usage:
from py_sod_metrics import Smeasure
sm = Smeasure()
sm.step(pred, gt)
results = sm.get_results()
print(f"S-measure: {results['sm']:.4f}")
E-measure (Enhanced-alignment Measure)
Captures both local and global matching information between prediction and ground truth.
Reference:
Fan et al., “Enhanced-alignment Measure for Binary Foreground Map Evaluation”, IJCAI 2018
Usage:
from py_sod_metrics import Emeasure
em = Emeasure()
em.step(pred, gt)
results = em.get_results()
print(f"Max E-measure: {results['em']['adp']:.4f}")
print(f"Avg E-measure: {results['em']['avg']:.4f}")
F-measure
Harmonic mean of precision and recall.
Reference:
Achanta et al., “Frequency-tuned salient region detection”, CVPR 2009
Usage:
from py_sod_metrics import Fmeasure
fm = Fmeasure()
fm.step(pred, gt)
results = fm.get_results()
print(f"Max F-measure: {results['fm']['adp']:.4f}")
Weighted F-measure
A weighted version of F-measure that assigns different importance to different pixels based on their location.
Reference:
Margolin et al., “How to evaluate foreground maps?”, CVPR 2014
Usage:
from py_sod_metrics import WeightedFmeasure
wfm = WeightedFmeasure()
wfm.step(pred, gt)
results = wfm.get_results()
print(f"Weighted F-measure: {results['wfm']:.4f}")
Advanced Metrics
FmeasureV2 Framework
A flexible framework for computing multiple binary classification metrics using different handlers.
Supported Handlers:
FmeasureHandler: F-measure with configurable βPrecisionHandler: Precision (Positive Predictive Value)RecallHandler: Recall (Sensitivity, TPR)IOUHandler: Intersection over UnionDICEHandler: Dice coefficientBERHandler: Balanced Error RateKappaHandler: Cohen’s KappaOverallAccuracyHandler: Overall classification accuracySpecificityHandler: Specificity (TNR)SensitivityHandler: Sensitivity (same as Recall)FPRHandler: False Positive RateTNRHandler: True Negative RateTPRHandler: True Positive Rate
Usage:
from py_sod_metrics import FmeasureV2, FmeasureHandler, IOUHandler
fm_v2 = FmeasureV2(
handlers={
"fm": FmeasureHandler(beta=0.3),
"iou": IOUHandler(),
}
)
fm_v2.step(pred, gt)
results = fm_v2.get_results()
Context-Measure
Designed specifically for camouflaged object detection, considering contextual information.
Reference:
Wang et al., “Context-measure: Contextualizing Metric for Camouflage”, arXiv 2025
Variants:
ContextMeasure: Standard context measure \(C_\beta\)CamouflageContextMeasure: Weighted context measure \(C^\omega_\beta\)
Usage:
from py_sod_metrics import ContextMeasure, CamouflageContextMeasure
cm = ContextMeasure()
ccm = CamouflageContextMeasure()
cm.step(pred, gt)
ccm.step(pred, gt)
Multi-Scale IoU (MSIoU)
Evaluates segmentation quality across multiple scales, particularly useful for fine structures.
Reference:
Ahmadzadeh et al., “Multiscale IOU: A Metric for Evaluation of Salient Object Detection with Fine Structures”, ICIP 2021
Usage:
from py_sod_metrics import MSIoU
msiou = MSIoU()
msiou.step(pred, gt)
results = msiou.get_results()
Human Correction Effort Measure
Estimates the effort required for humans to correct prediction errors.
Reference:
Qin et al., “Highly Accurate Dichotomous Image Segmentation”, ECCV 2022
Usage:
from py_sod_metrics import HumanCorrectionEffortMeasure
hcem = HumanCorrectionEffortMeasure()
hcem.step(pred, gt)
results = hcem.get_results()
Size-Invariant Metrics
For datasets with objects at multiple scales, size-invariant variants provide more balanced evaluation.
Size-Invariant F-measure
Reference:
Li et al., “Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection”, ICML 2024
Usage:
from py_sod_metrics import SizeInvarianceFmeasureV2
si_fm = SizeInvarianceFmeasureV2()
si_fm.step(pred, gt)
results = si_fm.get_results()
print(f"SI F-measure (avg): {results['fm']['si-avg']:.4f}")
Size-Invariant MAE
Usage:
from py_sod_metrics import SizeInvarianceMAE
si_mae = SizeInvarianceMAE()
si_mae.step(pred, gt)
results = si_mae.get_results()
Metric Comparison Table
Metric |
Sample-based |
Whole-based |
|---|---|---|
MAE |
soft, si-soft |
— |
S-measure |
soft |
— |
Weighted F-measure |
soft |
— |
Human Correction Effort |
soft |
— |
Context-Measure |
soft |
— |
Multi-Scale IoU |
max,avg,adp,bin |
— |
E-measure |
max,avg,adp |
— |
F-measure (V2) |
max,avg,adp,bin,si |
bin,si |
BER, Dice, IoU, Precision, Recall, etc. |
max,avg,adp,bin,si |
bin,si |
Notes
soft: Metrics that work directly on continuous prediction values
si-: Size-invariant variants that normalize by object size
adp: Adaptive thresholding based on prediction statistics
For detailed mathematical formulations, please refer to the original papers cited above