Table 1 shows the pixel count classes (PCC) distribution across different dataset classes. ‘Grass’ has the most pixels (51556), while ‘Disease’ has the fewest (4140). ‘Healthy_Apple’ notably dominates the dataset with a high PCC. ‘Cloud’, ‘Leaf’ and ‘Stem’ have moderate PCC values, indicating they contribute moderately to the overall pixel distribution.
The bar chart depicted in Fig 2 visually represents the distribution of pixel count across different classes, offering insights into the relative presence of each class within the dataset.
In evaluating the performance of a segmentation model for diseased apple fruit across six classes Cloud, Disease, Leaf, Healthy_Apple, Grass, Stem, the key metrics Accuracy, Intersection over Union (IOU) and Mean Boundary F1 Score (Mean BF Score) are employed. The terms that are fundamental in key metrics are,
TP (True Positives): Number of pixels correctly classified as diseased apple fruit.
TN (True Negatives): Number of pixels correctly classified as not diseased apple fruit (belonging to other classes).
FP (False Positives): Number of pixels incorrectly classified as diseased apple fruit (but belong to other classes).
FN (False Negatives): Number of pixels belonging to diseased apple fruit class but incorrectly classified as other classes.
Accuracy
Accuracy measures the percentage of correctly classified pixels among all the pixels in the segmentation masks. It’s calculated as the ratio of the number of correctly classified pixels to the total number of pixels. In the context of apple diseased fruit segmentation, accuracy indicates how effectively the model identifies different classes such as diseased areas, healthy apple regionsand background elements like clouds or grass.
For ex. the model achieved an accuracy of 0.98237 for the ‘Disease’ class, indicating that it correctly predicted approximately 98.24% of the pixels belonging to this class.
Intersection over union (IOU)
IOU measures the degree of overlap between the predicted segmentation mask generated by the model and the ground truth mask, which is either manually annotated or represents the true segmentation. It quantifies how well the predicted regions align with the actual regions of interest on apple trees, including disease spots, healthy areas, leaves, stemsand potential obstructions like clouds.
For ex. an IOU value of 0.90554 for the ‘Healthy_Apple’ class indicates a high degree of overlap between predicted and ground truth masks. This means that around 90.55% of the pixels classified as healthy apple by the model correspond to actual healthy apple regions. This strong IOU value reflects the model’s ability to accurately identify and delineate healthy apple regions, demonstrating its effectiveness in distinguishing between healthy and diseased areas. This high IOU value signifies strong performance in accurately segmenting healthy apple regions, crucial for apple disease detection tasks where distinguishing healthy areas is as important as identifying diseased regions.
Mean BF score (Boundary F1 Score)
Mean BF Score assesses the model’s ability to accurately predict object boundaries, such as boundaries between diseased and healthy areas on apple. It computes the F1 score for boundary prediction, which is the harmonic mean of precision and recall. A higher Mean BF Score indicates better boundary delineation accuracy, which is crucial for accurately segmenting different features of interest in apple images.
Table 2 shows the comparative analysis of accuracy, IOU and Mean BF Score across six classes and Fig 3 shows the corresponding bar chart.
From the above table and the barchart, it is clear that the performance metrics provide a comprehensive evaluation of PPN-Pixel Pyramid Net’s performance in detecting apple diseases through pixel-level segmentation with SPP. They evaluate pixel classification accuracy, mask prediction alignment and boundary prediction accuracy, aiding system evaluation and enhancement. For instance, the ‘HealthyApple’ class achieved a Mean BF Score of 0.69815, indicating effective boundary prediction. ‘Disease’ and ‘HealthyApple’ show high accuracy and IOU, while ‘Cloud’ encounters challenges with lower accuracy and IOU, particularly in cloud detection.
Accuracy, IOU and Mean BF Score focus on individual object or pixel-level accuracy, while Global Accuracy, Mean Accuracy, Mean IoU, Weighted IoU and Mean BF Score provide aggregate evaluations over multiple instances or classes.
For the task of apple disease detection using PPN-Pixel Pyramid Net with the evaluation metrics such as,
Global accuracy
measures overall precision in identifying pixels across all classes including ‘Cloud’, ‘Disease’, ‘Leaf’, ‘Healthy Apple’, ‘Grass’ and ‘Stem’.
Mean accuracy
evaluates average accuracy per class, considering the importance of identifying different areas like diseased, healthy, leaf, grass, stem and cloudy regions.
Mean IOU
Quantifies the average intersection over union for all classes, evaluating segmentation accuracy.
Weighted IOU
Gives more weight to classes with larger pixel counts like ‘HealthyApple’, ‘Grass’ and ‘Leaf’
Mean BF Score
Evaluates boundary prediction accuracy for each class, crucial for delineating boundaries between healthy and diseased areas, leaves, stemsand potential obstructions like clouds.
Table 3 shows the Performance Metrics to assess effectiveness of the PPN-Pixel Pyramid Net model for apple disease detection and Fig 4 shows the corresponding bar chart.
Han et al., (2021) presented the disease detection results for apple images through visualizing bounding boxes. In their description, they noted that ground truth bounding boxes were denoted by dotted lines, while predicted bounding boxes were illustrated with solid-colored lines. They emphasized that red arrows highlighted false positive detections, signifying instances where the model erroneously identified disease when it was not present. Additionally, they mentioned that blue arrows indicated false negative detections, representing cases where the model failed to detect actual disease.
The proposed PPN-Pixel Pyramid Net method performs well in pixel-level semantic segmentation by reducing false positives and false negatives compared to the Region Aggregated CNN
Han et al., (2021) method which is shown in Fig 5.
This Table 4 represents the training progress of a semantic segmentation neural network over multiple epochs. The table shows how both accuracy and loss change as the training progresses through different epochs. Typically, as the number of epochs increases, the accuracy tends to improve while the loss decreases. This indicates that the model is learning and becoming more accurate in its predictions over time.
Fig 6 represents the evolution of accuracy and loss over epochs during the training of a neural network. The line chart shows the trend of accuracy and loss as training progresses. At the beginning (epoch 1), both accuracy and loss values are relatively poor, indicating that the model is performing poorly and has high uncertainty. As training progresses, the accuracy steadily increases, while the loss gradually decreases.
Table 5 summarizes the precision, recall, interpolated precision at IOU and the area under the precision-recall curve (AUC) for each class, along with the computed average precision over intersection-over-union thresholds (AP). The formula for Average Precision (AP) is:
Where:
N= Total number of classes.
AUC-PR
i= Area under the precision-recall curve for class
i.
This formula gives the average of the AUC-PR values across all classes, providing a single scalar metric to evaluate the overall performance of the model across multiple classes.
In this comparison of disease detection models, the performance metrics of Average Precision (AP) shed light on their effectiveness in accurately identifying diseases from images. The Proposed PPN-Pixel Pyramid Net emerges as the top performer with an impressive AP of 89.31% and an outstanding AP50 score of 89.81%. These scores indicate its exceptional ability to achieve high precision and recall rates, even at the stringent IoU threshold of 50%. Moreover, the Region-aggregated attention CNN also demonstrates notable performance with an AP of 72.26% and a strong AP50 of 88.62%. These results highlight the efficacy of these models in disease detection tasks, offering promising solutions for accurate and efficient diagnosis and treatment planning.
The proposed PPN-Pixel Pyramid Net and the Region-aggregated attention CNN exhibit superior precision and recall compared to other models like Mask R-CNN, SSD, Retinanet and YOLOv3. These findings highlight the potential of advanced deep learning architectures in transforming disease detection, which could lead to substantial advancements in early diagnosis and intervention strategies. Table 6’s comparative performance analysis shows that the PPN-Pixel Pyramid Net model outperforms all others in terms of Average Precision (AP) and AP50.