Skip to content

Test scores

When testing a model with a dataset that includes the ground truth, scores are calculated based on a project type or algorithm. The Test results panel with the scores is displayed on the right by default. To hide it, select the test results icon (Test results icon) on the toolbar.

Project type Score
Image classification (EfficientNet) Accuracy: Percentage of correctly classified samples out of all samples.
Image classification (AI-ADC EfficientNet) - Accuracy: Percentage of correctly classified samples out of all samples.
- Macro average precision: Average percentage of correct predictions across all classes the model was trained on, treating each class equally. Calculated by summing precision values for each trained class and dividing by the number of trained classes.
- Macro average recall: Average percentage of actual samples correctly identified across all classes the model was trained on, treating each class equally. Calculated by summing recall values for each trained class and dividing by the number of trained classes.
- Unknown rate: Percentage of samples marked as unknown.
- Precision per class: Percentage of correct predictions for each individual class. Only displayed for classes the model was trained on.
- Recall per class: Percentage of actual samples correctly identified for each individual class. Only displayed for classes the model was trained on.
Multilabel classification Accuracy: Percentage of correct label predictions per sample, considering multiple labels per sample.
Object detection mAP50: Average precision when predicted boxes overlap ground truth by at least 50%.
Instance segmentation mAP50: Average precision when predicted masks overlap ground truth by at least 50%.
Semantic segmentation - mIOU: Average overlap between predicted and true segments across all classes the model was trained on.
- IOU per class: Overlap measured for each class the model was trained on.

Class filtering in test metrics

For AI-ADC EfficientNet, PIDNet, and DINOv3 models, test metrics are calculated based on the classes the model was trained on:

  • Macro average metrics (macro average precision, macro average recall, mIOU): These metrics are computed using only the classes present in the training dataset. For example, if your model was trained on "green" and "red" classes with recall values of 100% and 50% respectively, the macro average recall would be (100 + 50) / 2 = 75%. Classes present in the test dataset but not in the training dataset are excluded from this calculation to provide accurate performance metrics.

  • Per-class metrics (precision per class, recall per class, IOU per class): Only classes the model was trained on are displayed in the test results panel. This ensures the metrics reflect the model's actual performance on classes it can recognize.

  • Class list: The ground truth dataset may include additional classes that the model was not trained on. These classes will still be visible in the Classes panel and in the confusion matrix, as they are part of the test data and help identify samples that the model cannot classify correctly.

  • Confusion matrix: The confusion matrix displays all classes present in the ground truth, including those the model was not trained on. This ensures that all samples are accounted for and none randomly disappear from the visualization.