Dry beans (
Phaseolus vulgaris L.), one of the most significant legumes in the world, are valued for their high nutritional content (vitamins, proteins, minerals and carbs), widespread consumption and large production areas
(Long et al., 2019; Suarez-Martinez et al., 2016). DB has various characteristics that significantly contribute to agricultural sustainability in the context of diverse cropping systems. For any agricultural endeavor to thrive, utilizing superior seeds that can yield uniform and robust plants on time is crucial. Seed quality is determined by genetic, physiological, hygienic and physical characteristics. In 2020, the world’s total DB production and harvested area were 27.5 million metric tonnes and 34.8 million hectares, respectively. While the area harvested increased by 36% during the same period, DB production has climbed by around 60% since 1990 (
FAO, 2022). According to
Vandemark et al., (2017), dried beans cultivated in the United States show an average on-farm output growth of 12.9 kg/ha per year between 1909 and 2012. The leading cause of these increases is selection for plant type, disease resistance and insect resistance.
Siddiq et al., (2022) state that beans are essential for food security and avoiding malnutrition. Amazingly, 300 million individuals worldwide eat beans in their yearly meals.
The qualities of seeds have a significant impact on crop productivity in the agricultural industry. Various computer tools are available to assess the quality of agricultural and food products. But the majority of them are carried out by traditional means. For instance, manually determining the type of dry beans requires a knowledgeable person and a significant amount of time and it relies on human comprehension when categorizing seeds. Classifying the variety of seeds is challenging because they look similar manually. Without specialized machinery or automated software procedures, it is practically difficult for a human operator to understand or manage such seeds. The method of identifying seeds takes time and is subject to different interpretations. From a practical perspective, the situation becomes increasingly challenging regarding business and technical concerns. In particular, the color of various species of dry beans might vary and this information is not included in the geometric data. Therefore, creating an automated system for quickly identifying and categorizing seed traits is economically necessary and technically imperative. Automatic methods are greatly needed in the agricultural industry, which opens up new applications for computer vision techniques, including image categorization, identification of patterns and image splitting
(Li et al., 2018; Zheng et al., 2019; Lu et al., 2022). Combining image processing tools and artificial intelligence (AI) approaches has significantly improved the ability to analyze images and extract valuable information. Traditional seed evaluation methods can be time-consuming and subjective. However, image processing tools and AI algorithms can automate and enhance seed evaluation
(Jiao et al., 2019; Liu et al., 2020; Oguine et al., 2022).
However, these models only use one classifier, which has certain drawbacks, such as overfitting and biases in the case of large datasets. They cannot provide accurate predictions despite a decreased error rate. As a result, this study suggests an ensemble-based Dry Beans prediction model called Xtreme Stacking Prediction of Dry Beams (X-SPDB), which uses the ensemble method because it is more stable and better predictable than a single classifier and reduces model bias, variance and overfitting while increasing predictive accuracy (
Polikar, 2006; 13.
Sagi and Rokach, 2018;
Dong et al., 2020). The proposed model X-SPDB eliminates unnecessary features by utilizing the feature selection approach. When the dataset's heatmap is studied in X-SPDB, it is clear that only a few features are connected with the target; as a result, the most pertinent features are chosen using Sequential Floating Forward Selection (SFFS).
The paper is structured as a literature survey that reviews the research and work performed on the dry bean, its prediction and its importance. Methodology: Gives an elaborate description of the model design for prediction. The result section discusses and analyzes the result and finally, the conclusion draws the work summary.
Literature survey
Several research, like those by
Khilari et al., (2022) and
Gupta and Vanmathi (2021), predict the quality of wine using machine learning algorithms. The random forest (RF) model successfully predicted wine quality in 92% and 80.9% of the cases in the two trials, respectively.
Kayastha et al., (2024) have reviewed the fundamentals of precision agriculture, utilizing sophisticated technologies like GPS, sensors and data analytics to enhance resource efficiency and boost crop production. It emphasizes incorporating sustainable methods within precision agriculture frameworks, stressing the significance of environmental monitoring, soil vitality and biodiversity preservation. Additionally, it underscores the synergy between advanced agricultural technologies and eco-friendly farming approaches, outlining a trajectory for the agricultural sector toward sustainable and resilient nutritional security.
Rajendra Prasad et al., (2024) have provided an overview of how Indian seed regulations have contributed to the growth of the Indian seed industry and the effects of the COVID-19 pandemic on the seed sector. Using common techniques like linear discriminant analysis (LDA), RF and support vector machine (SVM),
De Medeiros et al., (2020) classified soybean seeds and seedlings according to appearance and physiological capacity; K-nearest neighbors (KNN) and Naive Bayes (NB) classifiers were used by
Khatri et al., (2022) to classify the seeds of three varieties of wheat;
Shingade et al., (2022) investigated the ability of the RF classifier to anticipate sustainable agricultural yield for a specific year;
Li et al., (2020) employed the upgraded ILEWSM method for the visual detection of external flaws and internal quality of apple fruits using the Otsu segmentation approach and the normalized spectral ratio. Various machine learning algorithms were employed by
Gupta and Vanmathi (2021) to predict wine quality and the RF model displayed the best performance, obtaining approximately 76.4% for white wine prediction and 73.3% for red wine prediction. Although the technology to categorize bean seed species was initially developed a few years ago, machine learning (ML) and artificial intelligence (AI) are now frequently utilized in research to identify dry bean seed species.
Klc et al., (2007) computer vision system (CVS), which considers the samples’ dimensions and color amounts, was developed for the quality control of the beans. An artificial neural network (ANN) was used to determine the hue of the beans. The samples were divided into five categories following the standards the system and the experts set. ANN was examined in 371 samples.
Venora et al., (2009) recommended utilizing KS-400, a for-profit image analysis package, to perform a linear discriminant analysis (LDA) approach for categorizing six Italian landrace bean varieties. The experiments involved assessing traits such as the size, shape, color and texture of the grains and the results were remarkable, achieving an impressive success rate of 99.56%. Further experiments on fifteen Italian traditional landraces of beans were done by
Venora et al., (2009) in their follow-up study, with a success percentage of 98.49%. For the Turkish Standards Institutes to define common dry bean varieties with physically similar traits but no distinctive color,
Koklu et al., (2020) have developed an artificial intelligence-based CVS. Many machine learning methods, including kNN, SVM, MLP and DT, have been 10-fold cross-validated and compared to the model classification. 92.52%, 93.13%, 87.92% and 91.73%, respectively, were the correct classification rates for DT, SVM, kNN and MLP. Due to the cultivation of multiple populations with various genotypes, the finished products will contain seeds from several species.
Oliveira et al., (2021) divided fermented cocoa beans into four groups using a quick and trustworthy computer vision system. Predictive traits were taken from the beans and used to identify the samples. Employing digital red, green and blue (RGB) images, they recommended employing RF to assess the quality of fermented beans as a cut test.
Khan et al., (2023) presented a methodology that considered the removal of outliers, class balancing using adaptive synthetic and then the procedure to determine the classifier with the best performance.
Aggarwal et al., (2022) have put forward research that facilitates providing farmers with IT-enabled solutions by employing data analytics on gathered information. It utilizes a web application designed to monitor soil fertility and offer recommendations to farmers regarding the most suitable crop(s) for cultivation in their specific geographical region.
Macuacua et al., (2023) developed a system for the classification of varieties of seeds automatically using different combinations of data techniques.
Kim et al., (2024) have demonstrated that AI-powered irrigation systems outperform traditional irrigation methods by delivering significant cost savings, enhancing crop yields and promoting water conservation. They’ve indicated that this study represents a landmark in integrating AI into precision agriculture, paving the path for a more sustainable and productive future in legume farming.
Setyaningrum et al., (2024) have employed a complete randomized block design featuring a single factor: fertilizer type, comprising seven levels. These levels included inorganic fertilizers (Urea 50 kg/ha, SP36 100 kg/ha and KCl 100 kg/ha), Indigofera tinctoria compost, corncob compost, peanut green manure, chicken manure, goat manure and cow manure (applied at a rate of 5 tons/ha), with each treatment replicated three times.